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Nucleotide Sequence of Escherichia coli 
Pathogenicity Islands 

Background of the Invention 

Field of the Invention 

5 The present invention relates to novel genes located in two chromosomal 

regions within E. coli that are associated with virulence. These chromosomal 
regions are known as pathogenicity islands (PAIs). 

Related Background Art 

Escherichia coli (E. coli) is a normal inhabitant of the intestine of humans 
1 o and various animals. Pathogenic E. coli strains are able to cause infections of the 

intestine (intestinal E. coli strains) and of other organs such as the urinary tract 
(uropathogenic E. coli) or the brain (extraintestinal E. coli). Intestinal pathogenic 
E. coli are a well established and leading cause of severe infantile diarrhea in the 
developing world. Additionally, cases of newborn meningitis and sepsis have 
1 5 been attributed to E. coli pathogens. 

In contrast to non-pathogenic isolates, pathogenic E. coli produce 
pathogenicity factors which contribute to the ability of strains to cause infectious 
diseases (Muhldorfer, 1. and Hacker, J., Microb. Pathogen. 16:171-181 1994). 
Adhesions facilitate binding of pathogenic bacteria to host tissues. Pathogenic 
20 E. coli strains also express toxins including haemolysins, which are involved in 

the destruction of host cells, and surface structures such as O-antigens, capsules 
or membrane proteins, which protect the bacteria from the action of phagocytes 
or the complement system (Ritter, et al.,Mol. Microbiol. 17:109-212 1995). 

The genes coding for pathogenicity factors of intestinal E. coli are located 
25 on large plasmids. phage genomes or on the chromosome. In contrast to intestinal 

E. coli, pathogenicity determinants of uropathogenic and other extraintestinal E. 
coli are, in most cases, located on the chromosome. Id. 



Large chromosomal regions in pathogenic bacteria that encode adjacently 
located virulence genes have been termed pathogenicity islands ("PAIs"). PAls 
are indicative of large fragments of DNA which comprise a group of virulence 
genes behaving as a distinct molecular and functional unit much like an island 
within the bacterial chromosome. For example, intact PAIs appear to transfer 
between organisms and confer complex virulence properties to the recipient 
bacteria. 

Chromosomal PAIs in bacterial cells have been described in increasing 
detail over recent years. For example, J. Hacker and co-workers described two 
large, unstable regions in the chromosome of uropathogenic Escherichia coli 
strain 536 as PAI-I and PAI-II (Hacker J., et aL, Microbiol. Pathog. 8:213-25 
1990). Hacker found that PAI-1 and PAI-II containing virulence regions can be 
lost by spontaneous deletion due to recombination events. Both of these PAIs 
were found to encode multiple virulence genes, and their loss resulted in reduced 
hemolytic activity, serum resistance, mannose-resistant hemagglutination, 
uroepithelial cell binding, and mouse virulence of the E. coli. (Knapp, S et al y J. 
Bacteriol 168:22-30 1986). Therefore, pathogenicity islands are characterized 
by their ability to confer complex virulence phenotypes to bacterial cells. 

In addition to E. coli, specific deletion of large virulence regions has been 
observed in other bacteria such as Yersinia pestis. For example, Fetherston and 
co-workers found that a 102-kb region of the Y. pestis chromosome lost by 
spontaneous deletion resulted in the loss of many Y. pestis virulence phenotypes. 
(Fetherston, J.D. and Perry, R.D., Mol Microbiol 13:697-708 1994, Fetherston, 
et al, Mol. Microbiol. 6:2693-704 1992). In this instance, the deletion appeared 
to be due to recombination within 2.2-kb repetitive elements at both ends of the 
102-kb region. 

It is possible that deletion of PAIs may benefit the organism by 
modulating bacterial virulence or genome size during infection. PAIs may also 
represent foreign DNA segments that were acquired during bacterial evolution 
that conferred important pathogenic properties to the bacteria. Observed flanking 
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repeats, as observed in Y. pestis for example, may suggest a common mechanism 
by which these virulence genes were integrated into the bacterial chromosomes. 

Integration of the virulence genes into bacterial chromosomes was further 
elucidated by the discovery and characterization of a locus of enterocyte 
effacement (the LEE locus) in enteropathogenic E. coli (McDaniel, et ai, Proc. 
Natl Acad. Sci. (USA) 92:1664-8 1995). The LEE locus comprises 35-kb and 
encodes many genes required for these bacteria to "invade" and degrade the apical 
structure of enerocytes causing diarrhea. Although the LEE and PAI-1 loci 
encode different virulence genes, these elements are located at the exact same site 
in the E. coli genome and contain the same DNA sequence within their right-hand 
ends, thus suggesting a common mechanism for their insertion. 

Besides being found in enteropathogenic E. coli, the LEE element is also 
present in rabbit diarrheal E. coli, Hafnia alvei, and Citrobacter freundii biotype 
4280, all of which induce attaching and effacing lesions on the apical face of 
enterocytes. The LEE locus appears to be inserted in the bacterial chromosome 
as a discrete molecular and functional virulence unit in the same fashion as PAI-I, 

PAI-I1, and Yersinia PAL 

Along these same lines, a 40-kb Salmonella typhimurium PAI was 
characterized on the bacterial chromosome which encodes genes required for 
Salmonella entry into nonphagocytic epithelial cells of the intestine (Mills, D.M., 
et ai, Mol. Microbiol. 15:749-59 1995). Like the LEE element, this PAI confers 
to Salmonella the ability to invade intestinal cells, and hence may likewise be 
characterized as an "invasion" PAI. 

The pathogenicity islands described above all possess the common feature 
of conferring complex virulence properties to the recipient bacteria. However, 
they may be separated into two types by their respective contributions to 
virulence. PAI-I, PAl-11, and the Y. pestis PAI confer multiple virulence 
phenotypes, while the LEE and the S. typhimurium "invasion" PAI encode many 
genes specifying a single, complex virulence process. 

It is advantageous to characterize closely-related bacteria that contain or 
do not contain the PAI by the isolation of a discrete molecular and functional unit 
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on the bacterial chromosome. Since the presence versus the absence of essential 
virulence genes can often distinguish closely-related virulent versus avirulent 
bacterial strains or species, experiments have been conducted to identify virulence 
loci and potential PAls by isolating DNA sequences that are unique to virulent 
5 bacteria (Bloch, C.A., et al., J Bacteriol 176:7121-5 1994, Groisman, E.A., 

EMBOJ. 12:3779-87 1993). 

At least two PAIs are present in E. coli J96. These PAIs, PAI IV and PA1 
V are linked to tRNA loci but at sites different from those occupied by other 
known E. coli PAIs. Swenson et al, Infect, and Immim. 64:3736-3743 (1996). 

10 The era of true comparative genomics has been ushered in by high 

through-put genomic sequencing and analysis. The first two complete bacterial 
genome sequences, those of Haemophilus influenzae and Mycoplasma genitalium 
were recently described (Fleischmann, R.D., et aL, Science 269:496 (1995); 
Fraser, CM., et al., Science 270:397 (1995)). Large scale DNA sequencing 

15 efforts also have produced an extensive collection of sequence data from 

eukaryotes, including Homo sapiens (Adams, M.D., et al. y Nature 377:3 (1995)) 
and Saccharomyces cerevisiae (Levy, J., Yeast 70:1689 (1994)). 

The need continues to exist for the application of high through-put 
sequencing and analysis to study genomes and subgenomes of infectious 

20 organisms. Further, a need exists for genetic markers that can be employed to 

distinguish closely-related virulent and avirulent strains of a given bacteria. 

Summary of the Invention 

The present invention is based on the high through-put, random 
sequencing of cosmid clones covering two pathogenic islands (PAIs) of 
25 uropathogenic Escherichia coli strain J96 (04:K6; E. coli J96). PAIs are large 

fragments of DNA which comprise pathogenicity determinants. PAI IV is located 
approximately at 64 min (near pheV) on the E. coli chromosome and is greater 
than 1 70 kilobases in size. PAI V is located at approximately 94 min (at pheR) 
on the E. coli chromosome and is approximately 106 kb in size. These PAIs 
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differ in location from the PAIs described by Hacker and colleagues for 
uropathogenic strain 536 (PAI I, 82 minutes {seIC} and PAI 11, 97 minutes 
{leuX}). 

The location of the PAIs relative to one another and the cosmid clones 
5 covering the J96 PAIs is shown in Figure 1 . The present invention relates to the 

nucleotide sequences of 1 42 fragments of DNA (contigs) covering the PAI IV and 
PAI V regions of the E. coli J96 chromosome. The nucleotide sequences shown 
in SEQ ID NOs: 1 through 142 were obtained by shotgun sequencing eleven E. 
coli J96 subclones, which were deposited in two pools on September 23, 1 996 at 
10 the American Type Culture Collection, 12301 Park Lawn Drive, Rockville, 

Maryland 20852, and given accession numbers 97726 (includes 7 cosmid clones 
covering PAI (IV) and 97727 (includes 4 cosmid clones covering PAI V). The 
deposited sets or "pools" of clones are more fully described in Example 1 . In 
addition, E. coli strain J96 was also deposited at the American Type Culture 
1 5 Collection on September 23, 1996, and given accession number 98176. 

Three hundred fifty-one open reading frames have been thus far identified 
in the 142 contigs described by SEQ ID NOs: 1 through 142. Thus, the present 
invention is directed to isolated nucleic acid molecules comprising open reading 
frames (ORFs) encoding E. coli J96 PAI proteins, and fragments of said nucleic 

20 acid molecules. 

The present invention also relates to variants of the nucleic acid molecules 
of the present invention, which encode portions, analogs or derivatives of E. coli 
J96 PAI proteins. Further embodiments include isolated nucleic acid molecules 
comprising a polynucleotide having a nucleotide sequence at least 90% identical, 
25 and more preferably at least 95%, 96%, 97%, 98% or 99% identical, to the 

nucleotide sequence of an E. coli J96 PAI ORF described herein, and fragments 
of said nucleic acid molecules. 

The present invention also relates to recombinant vectors, which include 
the isolated nucleic acid molecules of the present invention and fragments 
30 thereof, host cells containing the recombinant vectors, as well as methods for 
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making such vectors and host cells for E. coli J96 PA1 protein production by 
recombinant techniques. 

The invention further provides isolated polypeptides encoded by the E. 
coli J96 PA1 ORFs or fragments of said ORFs. It will be recognized that some 
amino acid sequences of the polypeptides described herein can be varied without 
significant effect on the structure or function of the protein. If such differences 
in sequence are contemplated, it should be remembered that there will be critical 
areas on the protein which determine activity. In general, it is possible to replace 
residues which form the tertiary structure, provided that residues performing a 
similar function are used. In other instances, the type of residue may be 
completely unimportant if the alteration occurs at a non-critical region of the 
protein. 

In another aspect, the invention provides a peptide or polypeptide 
comprising an epitope-bearing portion of a polypeptide of the invention. The 
epitope-bearing portion is an immunogenic or antigenic epitope useful for raising 
antibodies. 

The invention further provides a vaccine comprising one or more E. coli 
J96 PAI antigens together with a pharmaceutically acceptable diluent, carrier, or 
excipient, wherein the one or more antigens are present in an amount effective to 
elicit protective antibodies in an animal to pathogenic E. coli, such as strain J96. 

The invention also provides a method of eliciting a protective immune 
response in an animal comprising administering to the animal the above- 
described vaccine. 

The invention further provides a method for identifying pathogenic E. coli 
in an animal comprising analyzing tissue or body fluid from the animal for one 
or more of: 

(a) polynucleic acids encoding an open reading frame listed 
in Tables 1-4 or a fragment of said polynucleic acid; 

(b) full length or mature polypeptides encoded for by an open 
reading frame listed in Tables 1-4; or 
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(c) antibodies specific to polypeptides encoded for by an open 
reading frame listed in Tables 1-4. 

The invention further provides a nucleic acid probe for the detection of the 
presence of one or more E. coli PAI nucleic acids (nucleic acids encoding one or 
more ORFs as listed in Tables 1-4) in a sample from an individual comprising 
one or more nucleic acid molecules sufficient to specifically detect under 
stringent hybridization conditions the presence of the above-described molecule 
in the sample. 

The invention also provides a method of detecting E. coli PAI nucleic 
acids in a sample comprising: 

a) contacting the sample with the above-described nucleic acid probe, 
under conditions such that hybridization occurs, and 

b) detecting the presence of the probe bound to an E. coli PAI nucleic 

acid. 

The invention further provides a kit for detecting the presence of one or 
more £. coli PAI nucleic acids in a sample comprising at least one container 
means having disposed therein the above-described nucleic acid probe. 

The invention also provides a diagnostic kit for detecting the presence of 
pathogenic E. coli in a sample comprising at least one container means having 
disposed therein one or more of the above-described antibodies. 

The invention also provides a diagnostic kit for detecting the presence of 
antibodies to pathogenic E. coli in a sample comprising at least one container 
means having disposed therein one or more of the above-described antigens. 

Brief Description of the Figures 

Figure 1 is a schematic diagram of cosmid clones derived from E. coli 
J96 pathogenicity island and map positions of known E. coli PAIs (not drawn to 
scale). The gray bar represents the E. coli K-12 chromosome with minute 
demarcations of PAI junction points located above the bar. E. coli J96 
overlapping cosmid clones are represented by hatched bars (overlap not drawn to 
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scale) with positions of hly,pap, and prs operons indicated above bar. The PAIs 
and estimated sizes are shown above and below the K-12 chromosome map. 

Figure 2 is a block diagram of a computer system 102 that can be used to 
implement the computer-based systems of present invention. 

5 Detailed Description of the Invention 

The present invention is based on high through-put, random sequencing 
of a uropathogenic strain of Escherichia coli. The DNA sequences of contiguous 
DNA fragments covering the pathogenicity islands, PAI IV (also referred to as 
PA Ij96( P hcV)) PAI V (also referred to as PAI J96(phcU) ) from the chromosome of 

10 the E. coli uropathogenic strain, J96 (04:K6) were determined. The sequences 

were used for DNA and protein sequence similarity searches of the database. 

The primary nucleotide sequences generated by shotgun sequencing 
cosmid clones of the PAI IV and PAI V regions of the E. coli chromosome are 
provided in SEQ ID NOs:l through 142. These sequences represent contiguous 

1 5 fragments of the PAI DNA. As used herein, the "primary sequence 1 ' refers to the 

nucleotide sequence represented by the IUPAC nomenclature system. The 
present invention provides the nucleotide sequences of SEQ ID NOs:l through 
142, or representative fragments thereof, in a form that can be readily used, 
analyzed, and interpreted by a skilled artisan. Within these 142 sequences, there 

20 have been thus far identified 351 open reading frames (ORFs) that are described 

in greater detail below. 

As used herein, a "representative fragment" refers to E. coli J96 PAI 
protein-encoding regions (also referred to herein as open reading frames or 
ORFs), expression modulating fragments, and fragments that can be used to 

25 diagnose the presence of £. coli in a sample. A non-limiting identification of 

such representative fragments is provided in Tables 1 through 6, preferably in 
Tables 1 through 4. As described in detail below, representative fragments of the 
present invention further include nucleic acid molecules having a nucleotide 
sequence at least 95% identical, preferably at least 96%, 97%, 98%, or 99% 
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identical. to an ORF identified in Tables 1 through 6. or more preferably Tables 
1 through 4. 

As indicated above, the nucleotide sequence information provided in SEQ 
ID NOs:l through 142 was obtained by sequencing cosmid clones covering the 
PAIs located on the chromosome of E. coli J96 using a megabase shotgun 
sequencing method. The sequences provided in SEQ ID NOs:l through 142 are 
highly accurate, although not necessarily a 100% perfect, representation of the 
nucleotide sequences of contiguous stretches of DNA (contigs) which include the 
ORFs located on the two pathogenicity islands of E. coli J96. As discussed in 
detail below, using the information provided in SEQ ID NOs:l through 142 and 
in Tables 1 through 6 together with routine cloning and sequencing methods, one 
of ordinary skill in the art would be able to clone and sequence all "representative 
fragments" of interest including open reading frames (ORFs) encoding a large 
variety of £ coli J96 PAI proteins. In rare instances, this may reveal a nucleotide 
sequence error present in the nucleotide sequences disclosed in SEQ ID NOs: 1 
through 142. Thus, once the present invention is made available (i.e., once the 
information in SEQ ID NOs: 1 through 142 and in Tables 1 through 6 is made 
available), resolving a rare sequencing error would be well within the skill of the 
art. Nucleotide sequence editing software is publicly available. For example, 
Apphed Biosystem's (AB) AutoAssembler™ can be used as an aid during visual 
inspection of nucleotide sequences. 

Even if all of the rare sequencing errors were corrected, it is predicted that 
the resulting nucleotide sequences would still be at least about 99.9% identical 
to the reference nucleotide sequences in SEQ ID NOs: 1 through 142. Thus, the 
present invention further provides nucleotide sequences that are at least 99.9% 
identical to the nucleotide sequence of SEQ ID NOs: 1 through 142 in a form 
which can be readily used, analyzed and interpreted by the skilled artisan. 
Methods for determining whether a nucleotide sequence is at least 99.9% 
identical to a reference nucleotide sequence of the present invention are described 



30 below. 



WO 98/22575 



PCT7US97/21347 



-10- 

Nucleic Acid Molecules 

The present invention is directed to isolated nucleic acid fragments of the 
PAls of E. coli J96. Such fragments include, but are not limited to, nucleic acid 
molecules encoding polypeptides, nucleic acid molecules that modulate the 
5 expression of an operably linked ORF (hereinafter expression modulating 

fragments (EMFs)), and nucleic acid molecules that can be used to diagnose the 
presence of E. coli in a sample (hereinafter diagnostic fragments (DFs)). 

By "isolated nucleic acid molecule(s)" is intended a nucleic acid 
molecule, DNA or RNA, that has been removed from its native environment. For 

10 example, recombinant DNA molecules contained in a vector are considered 

isolated for the purposes of the present invention. Further examples of isolated 
DNA molecules include recombinant DNA molecules maintained in heterologous 
host cells, purified (partially or substantially) DNA molecules in solution, and 
nucleic acid molecules produced synthetically. Isolated RNA molecules include 

15 in vitro RNA transcripts of the DNA molecules of the present invention. 

In one embodiment, E. coli J96 PAI DNA can be mechanically sheared 
to produce fragments about 1 5-20 kb in length, which can be used to generate an 
E. coli J96 PAI DNA library by insertion into lambda clones as described in 
Example 1 below. Primers flanking an ORF described in Tables 1 through 6 can 

20 then be generated using the nucleotide sequence information provided in SEQ ID 

NOs: 1 through 142. The polymerase chain reaction (PCR) is then used to 
amplify and isolate the ORF from the lambda DNA library. PCR cloning is well 
known in the art. Thus, given SEQ ID NOs: 1 through 142, and Tables 1 through 
6, it would be routine to isolate any ORF or other representative fragment of the 

25 E. coli J96 PAI subgenomes. Isolated nucleic acid molecules of the present 

invention include, but are not limited to. single stranded and double stranded 
DNA, and single stranded RNA. and complements thereof. 

Tables 1 through 6 herein describe ORFs in the E. coli J96 PAI cosmid 
clone library. 
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Tables 1 and 3 list, for PAI IV and PA1 V, respectively, a number of ORFs 
that putatively encode a recited protein based on homology matching with protein 
sequences from an organism listed in the Table. Tables 1 and 3 indicate the 
location of ORFs (i.e., the position) by reference to its position within the one of 
5 the 142 E. coli J96 contigs described in SEQ ID NOs: 1 through 142. Column 1 

of Tables 1 and 3 provides the Sequence ID Number (SEQ ID NO) of the contig 
in which a particular open reading frame is located. Column 2 numerically 
identifies a particular ORE on a particular cont.g (SEQ ID NO) since many 
contigs comprise a plurality of ORFs. Columns 3 and 4 indicate an ORF's 
, o position in the nucleotide sequence (contig) provided in SEQ ID NOs: 1 through 

142 by referring to start and stop positions in the contig sequence. 

One of ordinary skill in the art will appreciate that the ORFs may be 
oriented in opposite directions in the E. coli chromosome. This is reflected in 
columns 3 and 4. For these ORFs, the sense strand is complementary to the 
actual sequence given. The corresponding sense-strand of the ORF must be read 
as the 5'-3' complement of the antisense strand actually shown in the Sequence 
Listing, wherein the location is specified 3'-5'. 

Column 5 provides a database accession number to a homologous protein 
identified by a similarity search of public sequence databases (see, infra). 
20 Column 6 describes the matching protein sequence and the source organism is 

identified in brackets. Column 7 of Tables 1 and 3 indicates the percent similarity 
of the protein sequence encoded by an ORF to the corresponding protein 
sequence from the organism appearing in parentheses in the sixth column. 
Column 8 of Tables 1 and 3 indicates the percent identity of the protein sequence 
25 encoded by an ORF to the corresponding protein sequence from the organism 

appearing in parentheses in the sixth column. The concepts of percent identity 
and percent similarity of two polypeptide sequences are well understood in the art 
and are described in more detail below. Identified genes can frequently be 
assigned a putative cellular role category adapted from Riley (see, Riley, M., 
30 Microbiol. Rev. 57:862 (1993)). Column 9 of Tables 1 and 3 provides the 

nucleotide length of the open reading frame. 
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Tables 2 and 4, below, provide ORFs of E. coli J96 PAI IV and PAI V, 
respectively, that did not elicit a homology match with a known sequence from 
either E. coli or another organism. As above, the first column in Tables 2 and 4 
provides the contig in which the ORF is located and the second column 
numerically identifies a particular ORF in a particular contig. Columns 3 and 4 
identify an ORF's position in one of SEQ ID NOs: 1 through 1 42 by reference to 
start and stop nucleotides. 

Tables 5 and 6, below, provide the E. coli J96 PAI IV ORFs and PAI V 
ORFs, respectively, identified by the present inventors that provided a significant 
match to a previously published E. coli protein. Columns 1-6 correspond to 
columns 1-6 appearing in Tables 1 and 3. Column 7 indicates the percent identity 
of the protein sequence encoded by an ORF to the corresponding protein 
sequence from the organism appearing in parentheses in the sixth column. 
Column 8 indicates the length of the high-scoring segment pair (HSP). Column 
9 provides the nucleotide length of the open reading frame. 

As used herein, "open reading frame" or "ORF" refers to the nucleotide 
sequences as described in Tables 1 through 6. In Tables 1 through 6, each ORF 
is designated by a nucleotide sequence start position and stop position according 
to numbering of contig nucleotides in the Sequence Listing provided (Contig ID 
= SEQ ID NO). 

In a first embodiment, the invention comprises a nucleotide sequence 
described in Tables 1 through 4 which begins with the nucleotide following the 
last nucleotide of an upstream stop codon (first nucleotide of the "ORF"), an 
initiation codon, in-frame putative polypeptide-encoding sequence, and 
nucleotides of an in-frame stop codon. 

In a second embodiment, the invention comprises a nucleotide sequence 
of Tables 1 through 4 which contains an initiation codon (e.g. a methionine or 
valine codon) on their 5' end and a stop codon on their 3' end. The sequences of 
this embodiment are present within the nucleotide sequence described in Tables 
1 through 4 by start and stop position as numbered in the Sequence Listing. To 
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determine the 5' start position of this embodiment, one simply reads 5 1 to 3' from 
the designated 5' end position until an initiation codon is found. 

In a third embodiment, the invention comprises a nucleotide sequence of 
the second embodiment, except that the 3' stop codon is not present. 

In a fourth embodiment, the invention comprises a nucleotide sequence 
encoding a putative protein which is a sequence of Tables 1 through 4 excluding 
sequence encoding amino acids subject to removal by post-translational 
processing and sequences 3' of the last codon coding for an amino acid present 
in the putative polypeptide (e.g., sequences not containing the stop codon and 
1 o encoding the mature form of the polypeptide). 

Certain embodiments of the invention may therefore either include or 
exclude initiation codons for methionine or valine and either include or exclude 
the stop codon. 

Further details concerning the algorithms and criteria used for homology 
searches are provided in the Examples below. A skilled artisan can readily 
identify ORFs in the Escherichia coli J96 cosmid library other than those listed 
in Tables 1 through 6, such as ORFs that are overlapping or encoded by the 
opposite strand of an identified ORF in addition to those ascertainable using the 
computer-based systems of the present invention. 

Isolated nucleic acid molecules of the present invention include DNA 
molecules having a nucleotide sequence substantially different than the nucleotide 
sequence of an ORF described in Tables 1 through 4, but which, due to the 
degeneracy of the genetic code, still encode a E. coli J96 PA1 protein. The 
genetic code is well known in the art. Thus, it would be routine to generate such 

25 degenerate variants. 

The present invention further relates to variants of the nucleic acid 

molecules of the present invention, which encode portions, analogs or derivatives 

of an E. coli protein encoded by an ORF described in Table 1 through 4. 

Non-naturally occurring variants may be produced using art-known mutagenesis 
30 techniques and include those produced by nucleotide substitutions, deletions or 

additions. The substitutions, deletions or additions may involve one or more 
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nucleotides. The variants may be altered in coding regions, non-coding regions, 
or both. Alterations in the coding regions may produce conservative or 
non-conservative amino acid substitutions, deletions or additions. Especially 
preferred among these are silent substitutions, additions and deletions, which do 
5 not alter the properties and activities of the E. coli protein or portions thereof. 

Also especially preferred in this regard are conservative substitutions. 

Further embodiments of the invention include isolated nucleic acid 
molecules comprising a polynucleotide having a nucleotide sequence at least 90% 
identical, and more preferably at least 95%, 96%, 97%, 98% or 99% identical, to 

10 the nucleotide sequence of an ORF described in Tables 1 through 6, preferably 

1 through 4. By a polynucleotide having a nucleotide sequence at least, for 
example, 95% identical to the reference E. coli ORF nucleotide sequence is 
intended that the nucleotide sequence of the polynucleotide is identical to the 
reference sequence except that the polynucleotide sequence may include up to 

15 five point mutations per each 100 nucleotides of the ORF sequence. In other 

words, to obtain a polynucleotide having a nucleotide sequence at least 95% 
identical to a reference ORF nucleotide sequence, up to 5% of the nucleotides in 
the reference sequence may be deleted or substituted with another nucleotide, or 
a number of nucleotides up to 5% of the total nucleotides in the reference 

20 sequence may be inserted into the reference sequence. These mutations of the 

reference sequence may occur at the 5' or 3' terminal positions of the reference 
nucleotide sequence or anywhere between those terminal positions, interspersed 
either individually among nucleotides in the reference sequence or in one or more 
contiguous groups within the reference sequence. 

25 As a practical matter, whether any particular nucleic acid molecule is at 

least 90%, 95%, 96%, 97%, 98% or 99% identical to the nucleotide sequence of 
an E. coli J96 PAI ORF can be determined conventionally using known computer 
programs such as the Bestfit program (Wisconsin Sequence Analysis Package, 
Version 8 for Unix, Genetics Computer Group, University Research Park, 575 

30 Science Drive, Madison, WI 5371 1). Bestfit uses the local homology algorithm 

of Smith and Waterman, Advances in Applied Mathematics 2: 482-489 (1981 ), 
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to find the best segment of homology between two sequences. When using 
Bestfit or any other sequence alignment program to determine whether a 
particular sequence is, for instance, 95% identical to a reference sequence 
according to the present invention, the parameters are set, of course, such that the 
percentage of identity is calculated over the full length of the reference nucleotide 
sequence and that gaps in homology of up to 5% of the total number of 
nucleotides in the reference sequence are allowed. 

Preferred are nucleic acid molecules having sequences at least 90%, 95%, 
96%, 97%, 98% or 99% identical to the nucleic acid sequence of an E. coli J96 
PAI ORF that encode a factional polypeptide. By a "functional polypeptide" is 
intended a polypeptide exhibiting activity similar, but not necessarily identical, 
to an activity of the protein encoded by the E. coli J96 PAI ORF. For example, 
the E. coli ORF [Contig ID 84, ORF ID 3 (84/3)] encodes a hemolysin. Thus, a 
"functional polypeptide" encoded by a nucleic acid molecule having a nucleotide 
sequence, for example, 95% identical to the nucleotide sequence of 84/3, will also 
possess hemolytic activity. As the skilled artisan will appreciate, assays for 
determining whether a particular polypeptide is "functional" will depend on 
which ORF is used as the reference sequence. Depending on the reference ORF, 
the assay chosen for measuring polypeptide activity will be readily apparent in 
light of the role categories provided in Tables 1, 3, 5 and 6. 

Of course, due to the degeneracy of the genetic code, one of ordinary skill 
in the art will immediately recognize that a large number of the nucleic acid 
molecules having a sequence at least 90%, 95%, 96%, 97%, 98%, or 99% 
identical to the nucle.c acid sequence of a reference ORF will encode a functional 
polypeptide. In fact, since degenerate variants all encode the same am.no acid 
sequence, this will be clear to the skilled artisan even without performing a 
comparison assay for protein activity. It will be further recognized in the art that, 
for such nucleic acid molecules that are not degenerate variants, a reasonable 
number will also encode a functional polypeptide. Th.s is because the skilled 
artisan is fullv aware of amino acid substitutions that are cither less likely or not 
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likcly to significantly affect protein function (e.g.. replacing one aliphatic amino 
acid with a second aliphatic amino acid). 

For example, guidance concerning how to make phenotypically silent 
amino acid substitutions is provided in Bowie, J. U. et al., "Deciphering the 
5 Message in Protein Sequences: Tolerance to Amino Acid Substitutions," Science 

247. 1306-1310 (1990), wherein the authors indicate that there are two main 
approaches for studying the tolerance of an amino acid sequence to change. The 
first method relies on the process of evolution, in which mutations are either 
accepted or rejected by natural selection. The second approach uses genetic 

1 0 engineering to introduce amino acid changes at specific positions of a cloned gene 

and selections or screens to identify sequences that maintain functionality. As the 
authors state, these studies have revealed that proteins are surprisingly tolerant of 
amino acid substitutions. The authors further indicate which amino acid changes 
are likely to be permissive at a certain position of the protein. For example, most 

1 5 buried amino acid residues require nonpolar side chains, whereas few features of 

surface side chains are generally conserved. Other such phenotypically silent 
substitutions are described in Bowie, J.U. et al., supra, and the references cited 
therein. 

The present invention is further directed to fragments of the isolated 
20 nucleic acid molecules described herein. By a fragment of an isolated nucleic 

acid molecule having the nucleotide sequence of an E. coli J96 PAI ORF is 
intended fragments at least about 15 nt, and more preferably at least about 20 nt, 
still more preferably at least about 30 nt, and even more preferably, at least about 
40 nt in length that are useful as diagnostic probes and primers as discussed 
25 herein. Of course, larger fragments 50-500 nt in length are also useful according 

to the present invention as are fragments corresponding to most, if not all, of the 
nucleotide sequence of an E. coli J96 PAI ORF. By a fragment at least 20 nt in 
length, for example, is intended fragments that include 20 or more contiguous 
bases from the nucleotide sequence of an E. coli J96 PAI ORF. Since E. coli 
30 ORFs are listed in Tables 1 through 6 and the sequences of the ORFs have been 

provided within the contig sequences of SEQ ID NOs: 1 through 142, generating 



NSDOCID <WO . 9822575A2 I > 



WO 98/22575 



PCT7US97/21347 



-17- 

such DNA fragments would be routine to the skilled artisan. For example, 
restriction endonuclease cleavage or shearing by sonication could easily be used 
to generate fragments of various sizes from the PAI DNA that is incorporated into 
the deposited pools of cosmid clones. Alternatively, such fragments could be 
generated synthetically. 

Preferred nucleic acid fragments of the present invention include nucleic 
acid molecules encoding epitope-bearing portions of an E. coli J96 PAI protein. 
Methods for determining such epitope-bearing portions are described in detail 
below. 

In another aspect, the invention provides an isolated nucleic acid molecule 
comprising a polynucleotide that hybridizes under stringent hybridization 
conditions to a portion of the polynucleotide in a nucleic acid molecule of the 
invention described above, for instance, an ORF described in Tables 1 through 
6, preferably an ORF described in Tables 1, 2, 3 or 4. By "stringent hybridization 
conditions" is intended overnight incubation at 42 °C in a solution comprising: 
50% formamide, 5 x SSC (150 mM NaCl, 15mM trisodium citrate), 50 mM 
sodium phosphate (pH 7.6), 5 x Denhardt's solution, 10% dextran sulfate, and 20 
g/ml denatured, sheared salmon sperm DNA, followed by washing the filters in 
0.1 x SSC at about 65 °C. 

By a polynucleotide that hybridizes to a "portion" of a polynucleotide is 
intended a polynucleotide (either DNA or RNA) hybridizing to at least about 15 
nucleotides (nt), and more preferably at least about 20 nt, still more preferably at 
least about 30 nt, and even more preferably about 30-70 nt of the reference 
polynucleotide. These are useful as diagnostic probes and primers as discussed 
above and in more detail below. 

Of course, polynucleotides hybridizing to a larger portion of the reference 
polynucleotide (e.g., a E. coli ORF), for instance, a portion 50-500 nt in length, 
or even to the entire length of the reference polynucleotide, are also useful as 
probes according to the present invention, as are polynucleotides corresponding 
to most, if not all, of an E. coli J96 PAI ORF. 
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By "expression modulating fragment" (EMF), is intended a series of 
nucleotides that modulate the expression of an operably linked, putative 
polypeptide-encoding region (encoding region). A sequence is said to "modulate 
the expression of an operably linked sequence" when the expression of the 
5 sequence is altered by the presence of the EMF. EMFs include, but are not 

limited to, promoters, and promoter modulating sequences (inducible elements). 
One class of EMFs are fragments that induce the expression of an operably linked 
encoding region in response to a specific regulatory factor or physiological event. 
EMF sequences can be identified within the E. coll genome by their proximity to 

1 0 the encoding regions within ORFs described in Tables 1 through 6. An intergenic 

segment, or a fragment of the intergenic segment, from about 10 to 200 
nucleotides in length, taken 5' from any one of the encoding regions of ORFs of 
Tables 1 through 6 will modulate the expression of an operably linked 3' 
encoding region in a fashion similar to that found within the naturally linked ORF 

15 sequence. As used herein, an "intergenic segment" refers to the fragments of the 

E. coli J96 PAI subgenome that are between two encoding regions herein 
described. Alternatively, EMFs can be identified using known EMFs as a target 
sequence or target motif in the computer-based systems of the present invention. 
The presence and activity of an EMF can be confirmed using an EMF trap 

20 vector. An EMF trap vector contains a cloning site 5' to a marker sequence. A 

marker sequence encodes an identifiable phenotype, such as antibiotic resistance 
or a complementing nutrition auxotrophic factor, which can be identified or 
assayed when the EMF trap vector is placed within an appropriate host under 
appropriate conditions. As described above, an EMF will modulate the 

25 expression of an operably linked marker sequence. A more detailed discussion 

of various marker sequences is provided below. 

A sequence that is suspected as being an EMF is cloned in all three 
reading frames in one or more restriction sites upstream from the marker 
sequence in the EMF trap vector. The vector is then transformed into an 

30 appropriate host using known procedures and the phenotype of the transformed 
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host in examined under appropriate conditions. As descnbed above, an EMF will 
modulate the expression of an operably linked marker sequence. 

By a "diagnostic fragment" (DF >, is intended a series of nucleotides that 
selectively hybridize to E. coli sequences. DFs can be readily identified by 
5 identifying unique sequences within the E. coli J96 PA1 subgenome, or by 

generating and testing probes or amplification primers consisting of the DF 
sequence in an appropriate diagnostic format for amplification or hybridization 
selectivity. 

Each of the ORFs of the E. coli J96 PA1 subgenome disclosed in Tables 
, 0 1 through 4, and EMFs found 5 ' to the encoding regions of the ORFs, can be used 

in numerous ways as polynucleotide reagents. The sequences can be used as 
diagnostic probes or diagnostic amplification primers to detect the presence of 
uropathogenic E. coli in a sample. This is especially the case with the fragments 
or ORFs of Table 2 and 4 which will be highly selective for uropathogenic E. coli 
, 5 J96, and perhaps other uropathogenic or extraintestinal strains that include one 

or more PAIs. 

In addition, the fragments of the present invention, as broadly described, 
can be used to control gene expression through triple helix formation or antisense 
DNA or RNA, both of which methods are based on the binding of a 
2 0 polynucleotide sequence to DNA or RNA. Polynucleotides suitable for use in 

these methods are usually 20 to 40 bases in length and are designed to be 
complementary to a region of the gene involved in transcription (triple helix - see 
Lee et al.,Nucl. Acids Res. 6:3073 (1979); Cooney el al. , Science 241:456 (1988); 
and Dervan et al, Science 257:1360 (1991)) or to the mRNA itself (antisense - 
25 Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense 

Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)). 

Triple helix- formation optimally results in a shut-off of RNA 
transcription from DNA, while antisense RNA hybridization blocks translation 
of an mRNA molecule into polypept.de. Both techniques have been 
30 demonstrated to be effective in model systems. Information contained in the 
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sequenees of the present invention is necessary for the design of an antisense or 
triple helix oligonucleotide. 

Vectors and Host Cells 

The present invention further provides recombinant constructs comprising 
one or more fragments of the E. coli J96 PAIs. The recombinant constructs of the 
present invention comprise a vector, such as a plasmid or viral vector, into which, 
for example, an E. coli J96 PAI ORF is inserted. The vector may further 
comprise regulatory sequences, including for example, a promoter, operably 
linked to the encoding region of an ORF. For vectors comprising the EMFs of 
the present invention, the vector may further comprise a marker sequence or 
heterologous ORF operably linked to the EMF. Large numbers of suitable 
vectors and promoters are known to those of skill in the art and are commercially 
available for generating the recombinant constructs of the present invention. The 
following vectors are provided by way of example. Bacterial: pBs, phagescript, 
PsiX174, pBluescript SK, pBs KS, pNH8a, pNH16a, pNH18a, pNH46a 
(Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia). 
Eukaryotic: pWLneo, pSV2cat, pOG44, pXTl, pSG (Stratagene) pSVK3, pBPV, 
pMSG, pSVL (Pharmacia). 

Promoter regions can be selected from any desired gene using CAT 
(chloramphenicol transferase) vectors or other vectors with selectable markers. 
Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial 
promoters include lad, lacZ, T3, T7, gpt, lambda P R , and trc. Eukaryotic 
promoters include CMV immediate early, HSV thymidine kinase, early and late 
SV40, LTRs from retrovirus, and mouse metal lothionein-I. Selection of the 
appropriate vector and promoter is well within the level of ordinary skill in the 
art. 

The present invention further provides host cells containing any one of the 
isolated fragments (preferably an ORF) of the E. coli J96 PAIs described herein. 
The host cell can be a higher eukaryotic host cell, such as a mammalian cell, a 
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lowcr eukaryotic host cell, such as a yeast cell, or the host cell can be a 
procaryotic cell, such as a bacterial cell. Introduct.on of the recombinant 
construct into the host cell can be effected by calcium phosphate transfection, 
DEAE, dextran mediated transfection, or electroporation (Davis, L. el ai, Basic 
Methods in Molecular Biology (1986)). Host cells containing, for example, an 
E. coli J96 PAI ORF can be used conventionally to produce the encoded protein. 
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Polypeptides and Fragments 

The invention further provides isolated polypeptides having the amino 
acid sequence encoded by an £ coli PAI ORF described in Tables 1 through 6, 
1 o preferably Tables 1 through 4, or a peptide or polypeptide comprising a portion 

of the above polypeptides. The terms "peptide" and "oligopeptide" are considered 
synonymous (as is commonly recognized) and each term can be used 
interchangeably as the context requires to indicate a chain of at least two amino 
acids coupled by peptidyl linkages. The word "polypeptide" is used herein for 
chains containing more than ten amino acid residues. All oligopeptide and 
polypeptide formulas or sequences herein are written from left to right and in the 
direction from amino terminus to carboxy terminus. 

It will be recognized in the art that some amino acid sequences of E. coli 
polypeptides can be varied without significant effect of the structure or function 
20 of the protein. If such differences in sequence are contemplated, it should be 

remembered that there will be critical areas on the protein which determine 
activity. In general, it is possible to replace residues which form the tertiary 
structure, provided that residues performing a similar function are used. In other 
instances, the type of residue may be completely unimportant if the alteration 
25 occurs at a non-critical region of the protein. 

Thus, the invention further includes variations of polypeptides encoded 
for by ORFs listed in Tables 1 through 6 which show substantial pathogenic 
activity or which include regions of particular E. coli PAI proteins such as the 
protein portions discussed below. Such mutants include deletions, insertions, 
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inversions, repeats, and type substitutions (for example, substituting one 
hydrophilic residue for another, but not strongly hydrophilic for strongly 
hydrophobic as a rule). Small changes or such "neutral" amino acid substitutions 
will generally have little effect on activity. 

Typically seen as conservative substitutions are the replacements, one for 
another, among the aliphatic amino acids Ala, Val, Leu and He; interchange of the 
hydroxy! residues Ser and Thr, exchange of the acidic residues Asp and Glu, 
substitution between the amide residues Asn and Gin, exchange of the basic 
residues Lys and Arg and replacements among the aromatic residues Phe, Tyr. 

As indicated in detail above, further guidance concerning which amino 
acid changes are likely to be phenotypically silent (i.e., are not likely to have a 
significant deleterious effect on a function) can be found in Bowie, J.U., et ai, 
"Deciphering the Message in Protein Sequences: Tolerance to Amino Acid 
Substitutions," Science 247:1306-1310 (1990). 

Thus, the fragment, derivative or analog of a polypeptide encoded by an 
ORF described in one of Tables 1 through 6, may be (i) one in which one or more 
of the amino acid residues are substituted with a conserved or non-conserved 
amino acid residue (preferably a conserved amino acid residue) and such 
substituted amino acid residue may or may not be one encoded by the genetic 
code, or (ii) one in which one or more of the amino acid residues includes a 
substituent group, or (iii) one in which the mature polypeptide is fused with 
another compound, such as a compound to increase the half-life of the 
polypeptide (for example, polyethylene glycol), or (iv) one in which the 
additional amino acids are fused to the mature polypeptide, such as an IgG Fc 
fusion region peptide or leader or secretory sequence or a sequence which is 
employed for purification of the mature polypeptide or a proprotein sequence. 
Such fragments, derivatives and analogs are deemed to be within the scope of 
those skilled in the art from the teachings herein. 

Of particular interest are substitutions of charged amino acids with 
another charged amino acid and with neutral or negatively charged amino acids. 
The latter results in proteins with reduced positive charge to improve the 
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characteristics of said proteins. The prevention of aggregation is highly desirable. 
Aggregation of proteins not only results in a loss of activity but can also be 
problematic when preparing pharmaceutical formulations, because they can be 
immunogenic. (Pinckard et al, Clin Exp. Immunol. 2:331-340 (1967); Robbins 
et al. Diabetes 36:838-845 (1987); Cleland et al. Crit. Rev. Therapeutic Drug 
Carrier Systems 70:307-377 (1993)). 

The replacement of amino acids can also change the selectivity of binding 
to cell surface receptors. Ostade et al, Nature 367:266-268 (1993) describes 
certain mutations resulting in selective binding of TNF-a to only one of the two 
known types of TNF receptors. Thus, proteins encoded for by the ORFs listed in 
Tables 1, 2, 3, 4, 5, or 6, and that bind to a cell surface receptor, may include one 
or more amino acid substitutions, deletions or additions, either from natural 
mutations or human manipulation. 

As indicated, changes are preferably of a minor nature, such as 
conservative amino acid substitutions that do not significantly affect the folding 
or activity of the protein (see Table 7). 



TABLE 7. Conservative Amino Acid Substitutions 



20 



Aromatic 


Phenylalanine 




Tryptophan 




Tyrosine 


Hydrophobic 


Leucine 


Isoleucine 




Valine 


Polar 


Glutamine 




Asparagine 


Basic 


Arginine 




Lysine 




Histidine 


Acidic 


Aspartic Acid 




Glutamic Acid 


Small 


Alanine 




Serine 




Threonine 




Methionine 




Glycine ! 
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Amino acids in the proteins encoded by ORFs of the present invention 
that are essential for function can be identified by methods known in the art, such 
as site-directed mutagenesis or alanine-scanning mutagenesis (Cunningham and 
Wells, Science 2^:1081-1085 (1989)). The latter procedure introduces single 
alanine mutations at every residue in the molecule. The resulting mutant 
molecules are then tested for biological activity such as receptor binding or in 
vitro, or in vitro proliferative activity. Sites that are critical for ligand-receptor 
binding can also be determined by structural analysis such as crystallization, 
nuclear magnetic resonance or photoaffinity labeling (Smith et al, J. Mol Biol 
22^:899-904 (1992) and de Vos et al Science 255:306-312 (1992)). 

The polypeptides of the present invention are preferably provided in an 
isolated form, and preferably are substantially purified. A recombinantly 
produced version of the polypeptides can be substantially purified by the one-step 
method described in Smith and Johnson, Gene (57. 31-40 (1988). 

The polypeptides of the present invention include the polypeptide encoded 
by the ORFs listed in Tables 1-6, preferably Tables 1-4, as well as polypeptides 
which have at least 90% similarity, more preferably at least 95% similarity, and 
still more preferably at least 96%, 97%, 98% or 99% similarity to those described 
above, and also include portions of such polypeptides with at least 30 amino acids 
and more preferably at least 50 amino acids. 

By "% similarity" for two polypeptides is intended a similarity score 
produced by comparing the amino acid sequences of the two polypeptides using 
the Bestfit program (Wisconsin Sequence Analysis Package, Version 8 for Unix, 
Genetics Computer Group, University Research Park, 575 Science Drive, 
Madison, Wl 5371 1) and the default settings for determining similarity. Bestfit 
uses the local homology algorithm of Smith and Waterman (Advances in Applied 
Mathematics 2:482-489, 1981 ) to find the best segment of similarity between two 
sequences. 

By a polypeptide having an amino acid sequence at least, for example, 
95% "identical" to a reference amino acid sequence of a polypeptide is intended 
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that the amino acid sequence of the polypeptide is identical to the reference 
sequence except that the polypeptide sequence may include up to five ammo acid 
alterations per each 100 amino acids of the reference amino acid of said 
polypeptide. In other words, to obtain a polypeptide having an amino acid 

5 sequence at least 95% identical to a reference amino acid sequence, up to 5% of 

the amino acid residues in the reference sequence may be deleted or substituted 
with another amino acid, or a number of amino acids up to 5% of the total amino 
acid residues in the reference sequence may be inserted into the reference 
sequence. These alterations of the reference sequence may occur at the amino or 

1 o carboxy terminal positions of the reference amino acid sequence or anywhere 

between those terminal positions, interspersed either individually among residues 
in the reference sequence or in one or more contiguous groups within the 

reference sequence. 

As a practical matter, whether any particular polypeptide is at least 90%, 
1 5 95%, 96%, 97%, 98% or 99% identical to, for instance, the amino acid sequence 

encoded by the ORFs listed in Tables 1, 2, 3, 4, 5, or 6 can be determined 
conventionally using known computer programs such the Bestfit program 
(Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer 
Group, University Research Park, 575 Science Drive, Madison, Wl 5371 1 . When 
20 using Bestfit or any other sequence alignment program to determine whether a 

particular sequence is, for instance, 95% identical to a reference sequence 
according to the present invention, the parameters are set, of course, such that the 
percentage of identity is calculated over the full length of the reference amino 
acid sequence and that gaps in homology of up to 5% of the total number of 
25 amino acid residues in the reference sequence are allowed. 

The polypeptide of the present invention could be used as a molecular 
weight marker on SDS-PAGE gels or on molecular sieve gel filtration columns 
using methods well known to those of skill in the art. 

As described in detail below, the polypeptides of the present invention can 
30 also be used to raise polyclonal and monoclonal antibodies, which are useful in 

assays for detecting pathogenic protein expression as described below or as 
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agonists and antagonists capable of enhancing or inhibiting protein function of 
important proteins encoded by the ORFs of the present invention. Further, such 
polypeptides can be used in the yeast two-hybrid system to "capture" protein 
binding proteins which are also candidate agonist and antagonist according to the 
5 present invention. The yeast two hybrid system is described in Fields and Song, 

Nature 540:245-246 (1989). 

In another aspect, the invention provides a peptide or polypeptide 
comprising an epitope-bearing portion of a polypeptide of the invention. The 
epitope of this polypeptide portion is an immunogenic or antigenic epitope of a 

10 polypeptide of the invention. An "immunogenic epitope" is defined as a part of 

a protein that elicits an antibody response when the whole protein is the 
immunogen. These immunogenic epitopes are believed to be confined to a few 
loci on the molecule. On the other hand, a region of a protein molecule to which 
an antibody can bind is defined as an "antigenic epitope." The number of 

1 5 immunogenic epitopes of a protein generally is less than the number of antigenic 

epitopes. See, for instance, Geysen et al, Proc. Natl Acad. Scl USA <5/:3998- 
4002 (1983). 

As to the selection of peptides or polypeptides bearing an antigenic 
epitope (i.e., that contain a region of a protein molecule to which an antibody can 

20 bind), it is well known in that art that relatively short synthetic peptides that 

mimic part of a protein sequence are routinely capable of eliciting an antiserum 
that reacts with the partially mimicked protein. See, for instance, Sutcliffe, J. G., 
Shinnick, T. M, Green, N. and Learner, R.A. (1983) Antibodies that react with 
predetermined sites on proteins. Science 2/9/660-666. Peptides capable of 

25 eliciting protein-reactive sera are frequently represented in the primary sequence 

of a protein, can be characterized by a set of simple chemical rules, and are 
confined neither to immunodominant regions of intact proteins (i.e., 
immunogenic epitopes) nor to the amino or carboxyl terminals. Peptides that are 
extremely hydrophobic and those of six or fewer residues generally are ineffective 

30 at inducing antibodies that bind to the mimicked protein; longer, peptides, 

especially those containing proline residues, usually are effective. Sutcliffe et al, 
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supra, at 661. For instance, 18 of 20 peptides designed according to these 
guidelines, containing 8-39 residues covering 75% of the sequence of the 
influenza virus hemagglutinin HA1 polypeptide chain, induced antibodies that 
reacted with the HA1 protein or intact virus; and 12/12 peptides from the MuLV 
5 polymerase and 18/18 from the rabies glycoprotein induced antibodies that 

precipitated the respective proteins. 

Antigenic epitope-bearing peptides and polypeptides of the invention are 
therefore useful to raise antibodies, including monoclonal antibodies, that bind 
specifically to a polypeptide of the invention. Thus, a high proportion of 
1 0 hybridomas obtained by fusion of spleen cells from donors immunized with an 

antigen epitope-bearing peptide generally secrete antibody reactive with the 
native protein. Sutcliffe el al, supra, at 663. The antibodies raised by antigenic 
epitope-bearing peptides or polypeptides are useful to detect the mimicked 
protein, and antibodies to different peptides may be used for tracking the fate of 
15 various regions of a protein precursor which undergoes post-translational 

processing. The peptides and anti-peptide antibodies may be used in a variety of 
qualitative or quantitative assays for the mimicked protein, for instance in 
competition assays since it has been shown that even short peptides (e.g., about 
9 amino acids) can bind and displace the larger peptides in immunoprecipitation 
20 assays. See, for instance, Wilson et al. Cell 37:161-11% (1984) at 777. The anti- 

peptide antibodies of the invention also are useful for purification of the 
mimicked protein, for instance, by adsorption chromatography using methods 

well known in the art. 

Antigenic epitope-bearing peptides and polypeptides of the invention 

25 designed according to the above guidelines preferably contain a sequence of at 

least seven, more preferably at least nine and most preferably between about 15 
to about 30 amino acids contained within the amino acid sequence of a 
polypeptide of the invention. However, peptides or polypeptides comprising a 
larger portion of an amino acid sequence of a polypeptide of the invention, 

30 containing about 30 to about 50 amino acids, or any length up to and including 

the entire amino acid sequence of a polypeptide of the invention, also are 
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considered epitope-bearing peptides or polypeptides of the invention and also are 
useful for inducing antibodies that react with the mimicked protein. Preferably, 
the amino acid sequence of the epitope-bearing peptide is selected to provide 
substantial solubility in aqueous solvents (i.e., the sequence includes relatively 
5 hydrophilic residues and highly hydrophobic sequences are preferably avoided); 

and sequences containing proline residues are particularly preferred. 

The epitope-bearing peptides and polypeptides of the invention may be 
produced by any conventional means for making peptides or polypeptides 
including recombinant means using nucleic acid molecules of the invention. For 

10 instance, a short epitope-bearing amino acid sequence may be fused to a larger 

polypeptide which acts as a carrier during recombinant production and 
purification, as well as during immunization to produce anti-peptide antibodies. 
Epitope-bearing peptides also may be synthesized using known methods of 
chemical synthesis. For instance, Houghten has described a simple method for 

15 synthesis of large numbers of peptides, such as 10-20 mg of 248 different 13 

residue peptides representing single amino acid variants of a segment of the HA1 
polypeptide which were prepared and characterized (by ELISA-type binding 
studies) in less than four weeks. Houghten, R. A. ( 1 985) General method for the 
rapid solid-phase synthesis of large numbers of peptides: specificity of 

20 antigen-antibody interaction at the level of individual amino acids. Proc. Natl. 

Acad. ScL USA £2:5131-5135. This "Simultaneous Multiple Peptide Synthesis 
(SMPS) M process is further described in U.S. Patent No. 4,63 1,21 1 to Houghten 
et al (1986). In this procedure the individual resins for the solid-phase synthesis 
of various peptides are contained in separate solvent-permeable packets, enabling 

25 the optimal use of the many identical repetitive steps involved in solid-phase 

methods. A completely manual procedure allows 500-1000 or more syntheses to 
be conducted simultaneously. Houghten et al, supra, at 5134. 

Epitope-bearing peptides and polypeptides of the invention are used to 
induce antibodies according to methods well known in the art. See, for instance, 

30 Sutcliffe et aL, supra; Wilson et al., supra', Chow, M. et al., Proc. Natl. Acad 

Sci. USA 52:910-914; and Bittle, F. J. et al, J. Gen. Virol 6(5:2347-2354 (1985). 
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Gcncrally, animals may be immunized with free peptide; however, anti-peptide 
antibody titer may be boosted by coupling of the peptide to a macromolecular 
carrier, such as keyhole limpet hemacyanin (KLH) or tetanus toxoid. For 
instance, peptides containing cysteine may be coupled to carrier using a linker 
such as m-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS), while other 
peptides may be coupled to carrier using a more general linking agent such as 
glutaraldehyde. Animals such as rabbits, rats and mice are immunized with either 
free or carrier-coupled peptides, for instance, by intraperitoneal and/or 
intradermal injection of emulsions containing about 100 ug peptide or carrier 
protein and Freund's adjuvant. Several booster injections may be needed, for 
instance, at intervals of about two weeks, to provide a useful titer of anti-peptide 
antibody which can be detected, for example, by EL1SA assay using free peptide 
adsorbed to a solid surface. The titer of anti-peptide antibodies in serum from an 
immunized animal may be increased by selection of anti-peptide antibodies, for 
instance, by adsorption to the peptide on a solid support and elution of the 
selected antibodies according to methods well known in the art. 

Immunogenic epitope-bearing peptides of the invention, i.e., those parts 
of a protein that elicit an antibody response when the whole protein is the 
immunogen, are identified according to methods known in the art. For instance, 
Geysen et al, supra, discloses a procedure for rapid concurrent synthesis on solid 
supports of hundreds of peptides of sufficient purity to react in an enzyme-linked 
immunosorbent assay. Interaction of synthesized peptides with antibodies is then 
easily detected without removing them from the support. In this manner a peptide 
bearing an immunogenic epitope of a desired protein may be identified routinely 
by one of ordinary skill in the art. For instance, the immunologically important 
epitope in the coat protein of foot-and-mouth disease virus was located by Geysen 
et al. supra with a resolution of seven amino acids by synthesis of an overlapping 
set of all 208 possible hexapeptides covering the entire 213 amino acid sequence 
of the protein. Then, a complete replacement set of peptides in which all 20 
amino acids were substituted in turn at every position within the epitope were 
synthesized, and the particular amino acids conferring specificity for the reaction 
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with antibody were determined. Thus, peptide analogs of the epitope-bearing 
peptides of the invention can be made routinely by this method. U.S. Patent No. 
4,708,781 to Geysen (1987) further describes this method of identifying a peptide 
bearing an immunogenic epitope of a desired protein. 
5 Further still, U.S. Patent No. 5,194,392 to Geysen (1990) describes a 

general method of detecting or determining the sequence of monomers (amino 
acids or other compounds) which is a topological equivalent of the epitope (i.e., 
a "mimotope") which is complementary to a particular paratope (antigen binding 
site) of an antibody of interest. More generally, U.S. Patent No. 4,433,092 to 

10 Geysen (1989) describes a method of detecting or determining a sequence of 

monomers which is a topographical equivalent of a ligand which is 
complementary to the ligand binding site of a particular receptor of interest. 
Similarly, U.S. Patent No. 5,480,971 to Houghten, R. A. et al (1996) on 
Peralkylated Oligopeptide Mixtures discloses linear C r C 7 -alkyl peralkylated 

1 5 oligopeptides and sets and libraries of such peptides, as well as methods for using 

such oligopeptide sets and libraries for determining the sequence of a peralkylated 
oligopeptide that preferentially binds to an acceptor molecule of interest. Thus, 
non-peptide analogs of the epitope-bearing peptides of the invention also can be 
made routinely by these methods. 

20 The entire disclosure of each document cited in this section on 

"Polypeptides and Peptides" is hereby incorporated herein by reference. 

As one of skill in the art will appreciate, E. coli PAI polypeptides of the 
present invention and the epitope-bearing fragments thereof described above can 
be combined with parts of the constant domain of immunoglobulins (IgG), 

25 resulting in chimeric polypeptides. These fusion proteins facilitate purification 

and show an increased half-life in vivo. This has been shown, e.g., for chimeric 
proteins consisting of the first two domains of the human CD4-polypeptide and 
various domains of the constant regions of the heavy or light chains of 
mammalian immunoglobulins (EP A 394,827; Traunecker et al. t Nature 537:84- 

30 86 (1988)). Fusion proteins that have a disulfide-linked dimeric structure due to 

the IgG part can also be more efficient in binding and neutralizing other 
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molcculcs than the monomeric E. coli J96 PAl proteins or protein fragments 
alone (Fountoulakis et al.. J. Biochem 270:3958-3964 (1995)). 

Vaccines 

In another embodiment, the present invention relates to a vaccine, 
preferably in unit dosage form, comprising one or more E. coli J96 PAI antigens 
together with a pharmaceutically acceptable diluent, carrier, or excipient, wherein 
the one or more antigens are present in an amount effective to elicit a protective 
immune response in an animal to pathogenic E. coli. Antigens of E. coli J96 PAI 
IV and V may be obtained from polypeptides encoded for by the ORFs listed in 
Tables 1-6, particularly Tables 1-4, using methods well known in the art. 

In a preferred embodiment, the antigens are E. coli J96 PAI IV or PAI V 
proteins that are present on the surface of pathogenic E. coli. In another preferred 
embodiment, the pathogenic E. coli J96 PAI IV or PAI V protein-antigen is 
conjugated to an E. coli capsular polysaccharide (CP), particularly to capsular 
polypeptides that are more prevalent in pathogenic strains, to produce a double 
vaccine. CPs, in general, may be prepared or synthesized as described in 
Sclineerson et al. J. Exp. Med. 752:361-376 (1980); Marburg et al. J. Am. Chem. 
Soc. 108:5282 (1986); Jennings et al, J. Immunol. 727.101 1-1018 (1981); and 
Beuvery et al, Infect. Immunol. 40:39-45 (1983). In a further preferred 
embodiment, the present invention relates to a method of preparing a 
polysaccharide conjugate comprising: obtaining the above-described E. coli J96 
PAI antigen; obtaining a CP or fragment from pathogenic E. coli; and conjugating 
the antigen to the CP or CP fragment. 

In a preferred embodiment, the animal to be protected is selected from the 
group consisting of humans, horses, deer, cattle, pigs, sheep, dogs, and chickens. 
In a more preferred embodiment, the animal is a human or a dog. 

In a further embodiment, the present invention relates to a prophylactic 
method whereby the incidence of pathogenic E. co//"-induced symptoms are 
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decreased in an animal, comprising administering to the animal the above- 
described vaccine, wherein the vaccine is administered in an amount effective to 
elicit protective antibodies in an animal to pathogenic E. coli. This vaccination 
method is contemplated to be useful in protecting against severe diarrhea 
5 (pathogenic intestinal E. coli strains), urinary tract infections (uropathogenic E. 

coli) and infections of the brain (extraintestinal E. coli). The vaccine of the 
invention is used in an effective amount depending on the route of administration. 
Although intra-nasal, subcutaneous or intramuscular routes of administration are 
preferred, the vaccine of the present invention can also be administered by an 
10 oral, intraperitoneal or intravenous route. One skilled in the art will appreciate 

that the amounts to be administered for any particular treatment protocol can be 
readily determined without undue experimentation. Suitable amounts are within 
the range of 2 micrograms of the protein per kg body weight to 100 micrograms 
per kg body weight. 

1 5 The vaccine can be delivered through a vector such as BCG. The vaccine 

can also be delivered as naked DNA coding for target antigens. 

The vaccine of the present invention may be employed in such dosage 
forms as capsules, liquid solutions, suspensions or elixirs for oral administration, 
or sterile liquid forms such as solutions or suspensions. Any inert carrier is 

20 preferably used, such as saline, phosphate-buffered saline, or any such carrier in 

which the vaccine has suitable solubility properties. The vaccines may be in the 
form of single dose preparations or in multi-dose flasks which can be used for 
mass vaccination programs. Reference is made to Remington's Pharmaceutical 
Sciences, Mack Publishing Co., Easton, PA, Osol (ed.) (1980); and New Trends 

25 and Developments in Vaccines, Voller et al (eds.), University Park Press, 

Baltimore, MD (1978), for methods of preparing and using vaccines. 

The vaccines of the present invention may further comprise adjuvants 
which enhance production of antibodies and immune cells. Such adjuvants 
include, but are not limited to, various oil formulations such as Freund's complete 

30 adjuvant (CPA), the dipeptide known as MDP, saponins (ex. Ouillajasaponin 

fraction QA-21, U.S. Patent No. 5,047,540), aluminum hydroxide, or lymphatic 
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cytokines. Freund's adjuvant is an emulsion of mineral oil and water which is 
mixed with the immunogenic substance. Although Freund's adjuvant is powerful, 
it is usually not administered to humans. Instead, the adjuvant alum (aluminum 
hydroxide) may be used for administration to a human. Vaccine may be absorbed 
5 onto the aluminum hydroxide from which it is slowly released after injection. 

The vaccine may also be encapsulated within liposomes according to Fullerton, 
U.S. Patent No. 4,235,877. 



Protein Function 



Each ORF described in Tables 1 and 3 possesses a biological role similar 
10 to the role associated with the identified homologous protein. This allows the 

skilled artisan to determine a function for each identified coding sequence. For 
example, a partial list of the E. coli protein functions provided in Tables 1 and 3 
includes many of the functions associated with virulence of pathogenic bacterial 
strains. These include, but are not limited to adhesins, excretion pathway 
1 5 proteins, O-antigen/carbohydrate modification, cytotoxins and regulators. A more 

detailed description of several of these functions is provided in Example 1 below. 



Diagnostic Assays 



In another preferred embodiment, the present invention relates to a 
method of detecting pathogenic E. coli nucleic acid in a sample comprising: 
20 a) contacting the sample with the above-described nucleic acid probe, 

under conditions such that hybridization occurs, and 

b) detecting the presence of the probe bound to pathogenic E. coli nucleic 

acid. 

In another preferred embodiment, the present invention relates to a 
25 diagnostic kit for detecting the presence of pathogenic E. coli nucleic acid in a 

sample comprising at least one container means having disposed therein the 
above-described nucleic acid probe. 
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In another preferred embodiment, the present invention relates to a 
diagnostic kit for detecting the presence of pathogenic E. coli antigens in a sample 
comprising at least one container means having disposed therein the above- 
described antibodies. 

5 In another preferred embodiment, the present invention relates to a 

diagnostic kit for detecting the presence of antibodies to pathogenic E. coli 
antigens in a sample comprising at least one container means having disposed 
therein the above-described antigens. 

The present invention provides methods to identify the expression of an 

1 0 ORF of the present invention, or homolog thereof, in a test sample, using one of 

the antibodies of the present invention. Such methods involve incubating a test 
sample with one or more of the antibodies of the present invention and assaying 
for binding of the antibodies to components within the test sample. 

In a further embodiment, the present invention relates to a method for 

1 5 identifying pathogenic E. coli in an animal comprising analyzing tissue or body 

fluid from the animal for a nucleic acid, protein, polypeptide-antigen or antibody 
specific to one of the ORFs described in Tables 1-4 herein from E. coli J96 PAI 
IV or V. Analysis of nucleic acid specific to pathogenic £. coli can be by PCR 
techniques or hybridization techniques (cf. Molecular Cloning: A Laboratory 

20 Manual second edition, edited by Sambrook, Fritsch, & Maniatis, Cold Spring 

Harbor Laboratory, 1989; Eremeeva et al, J. Clin. Microbiol 32. 803-810 (1994) 
which describes differentiation among spotted fever group Rickettsiae species by 
analysis of restriction fragment length polymorphism of PCR-amplified DNA). 
Proteins or antibodies specific to pathogenic E. coli may be identified as 

25 described in Molecular Cloning: A Laboratory Manual, second edition, 

Sambrook et al y eds., Cold Spring Harbor Laboratory (1989). More specifically, 
antibodies may be raised to E. coli J96 PAI proteins as generally described in 
Antibodies: A Laboratory Manual, Harlow and Lane, eds.. Cold Spring Harbor 
Laboratory (1988). E. coli J96 PAI-specific antibodies can also be obtained from 

30 infected animals (Mather, T. etal,JAMA 205:186-188 (1994)). 
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In another embodiment, the present invention relates to an antibody 
having binding affinity specifically to an E. coli J°>6 PAI antigen as described 
above. The E. coli J96 PAI antigens of the present invention can be used to 
produce antibodies or hybridomas. One skilled in the art will recognize that if an 
5 antibody is desired, a peptide can be generated as described herein and used as an 

immunogen. The antibodies of the present invention include monoclonal and 
polyclonal antibodies, as well as fragments of these antibodies. The invention 
further includes single chain antibodies. Antibody fragments which contain the 
idiotype of the molecule can be generated by known techniques, for example, 

10 such fragments include but are not limited to: the F(ab') 2 fragment; the Fab' 

fragments, Fab fragments, and Fv fragments. 

Of special interest to the present invention are antibodies to pathogenic 
E. coli antigens which are produced in humans, or are "humanized" (i.e. non- 
immunogenic in a human) by recombinant or other technology. Humanized 

1 5 antibodies may be produced, for example by replacing an immunogenic portion 

of an antibody with a corresponding, but non-immunogenic portion (i.e. chimeric 
antibodies) (Robinson, R.R. et aL, International Patent Publication 
PCT/US86/02269; Akira, K. et a!., European Patent Application 184,187; 
Taniguchi, M., European Patent Application 171,496; Morrison, S.L. et aL, 

20 European Patent Application 1 73,494; Neuberger, M.S. et al., PCT Application 

WO 86/01533; Cabilly, S. et aL, European Patent Application 125,023; Better, 
M. et al. t Science 240:1041-1043 (1988); Liu, A.Y. et aL, Proc. Natl Acad. Set 
USA 54:3439-3443 (1987); Liu, A.Y. et aL, J. Immunol. 759:3521-3526 (1987); 
Sun, L.K. et aL, Proc. Natl. Acad Sci. USA 54:214-218 (1987); Nishimura, Y. 

25 et aL, Cane. Res. 47:999-1005 (1987); Wood, C.R. et aL, Nature 374:446-449 

(1985)); Shawe/ aL. J. Natl. Cancer Inst. 50:1553-1559 (1988). General reviews 
of "humanized" chimeric antibodies are provided by Morrison, S.L. (Science, 
229:1202-1207 (1985)) and by Oi. V.T. et aL. BioTechniques 4:214 (1986)). 
Suitable "humanized" antibodies can be alternatively produced by CDR or CEA 

30 substitution (Jones, P.T. et aL, Nature 327:552-525 (1986); Verhoeyan et aL, 
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Science 239: 1534 (1988); Beidler, C.B. et aL J. Immunol. 777:4053-4060 
(1988)). 

In another embodiment, the present invention relates to a hybridoma 
which produces the above-described monoclonal antibody. A hybridoma is an 
5 immortalized cell line which is capable of secreting a specific monoclonal 

antibody. 

In general, techniques for preparing monoclonal antibodies and 
hybridomas are well known in the art (Campbell, "Monoclonal Antibody 
Technology: Laboratory Techniques in Biochemistry and Molecular Biology," 

10 Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. Groth 

et aL, J. Immunol. Methods 55:1-21 (1980)). 

In another embodiment, the present invention relates to a method of 
detecting a pathogenic E. coli antigen in a sample, comprising: a) contacting the 
sample with an above-described antibody, under conditions such that 

1 5 immunocomplexes form, and b) detecting the presence of said antibody bound to 

the antigen. In detail, the methods comprise incubating a test sample with one or 
more of the antibodies of the present invention and assaying whether the antibody 
binds to the test sample. 

Conditions for incubating an antibody with a test sample vary. Incubation 

20 conditions depend on the format employed in the assay, the detection methods 

employed, and the type and nature of the antibody used in the assay. One skilled 
in the art will recognize that any one of the commonly available immunological 
assay formats (such as radioimmunoassays, enzyme-linked immunosorbent 
assays, diffusion based Ouchterlony, or rocket immunofluorescent assays) can 

25 readily be adapted to employ the antibodies of the present invention. Examples 

of such assays can be found in Chard, An Introduction to Radioimmunoassay and 
Related Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands 
(1986); Bullock et aL, Techniques in Immunocytochemistry, Academic Press. 
Orlando, FL Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, Practice and 

30 Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and 

Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 
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(1985); and Antibodies: A Laboratory Manual, Harlow and Lane, eds., Cold 
Spring Harbor Laboratory (1988). 

The immunological assay test samples of the present invention include 
cells, protein or membrane extracts of cells, or biological fluids such as blood, 
serum, plasma, or urine. The test sample used in the above-described method will 
vary based on the assay format, nature of the detection method and the tissues, 
cells or extracts used as the sample to be assayed. Methods for preparing protein 
extracts or membrane extracts of cells are well known in the art and can be 
readily be adapted in order to obtain a sample which is capable with the system 
utilized. 

In another embodiment, the present invention relates to a method of 
detecting the presence of antibodies to pathogenic E. coli in a sample, 
comprising: a) contacting the sample with an above-described antigen, under 
conditions such that immunocomplexes form, and b) detecting the presence of 
said antigen bound to the antibody. In detail, the methods comprise incubating 
a test sample with one or more of the antigens of the present invention and 
assaying whether the antigen binds to the test sample. 

In another embodiment of the present invention, a kit is provided which 
contains all the necessary reagents to carry out the previously described methods 
of detection. The kit may comprise: i) a first container means containing an 
above-described antibody, and ii) second container means containing a conjugate 
comprising a binding partner of the antibody and a label. In another preferred 
embodiment, the kit further comprises one or more other containers comprising 
one or more of the following: wash reagents and reagents capable of detecting 
the presence of bound antibodies. Examples of detection reagents include, but are 
not limited to, labeled secondary antibodies, or in the alternative, if the primary 
antibody is labeled, the chromophoric. enzymatic, or antibody binding reagents 
which are capable of reacting with the labeled antibody. The compartmentalized 
kit may be as described above for nucleic acid probe kits. 
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Onc skilled in the art will readily recognize that the antibodies described 
in the present invention can readily be incorporated into one of the established kit 
formats which are well known in the art. 

Screening Assay for Binding Agents 

5 Using the isolated proteins described herein, the present invention further 

provides methods of obtaining and identifying agents that bind to a protein 
encoded by an E. coli J96 PAI ORF or to a fragment thereof. 
The method involves: 

(a) contacting an agent with an isolated protein encoded by a E. coli 
10 J96 PAI ORF, or an isolated fragment thereof; and 

(b) determining whether the agent binds to said protein or said 
fragment. 

The agents screened in the above assay can be, but are not limited to, 
peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents. The 

15 agents can be selected and screened at random or rationally selected or designed 

using protein modeling techniques. For random screening, agents such as 
peptides, carbohydrates, pharmaceutical agents and the like are selected at 
random and are assayed for their ability to bind to the protein encoded by an ORF 
of the present invention. 

20 Alternatively, agents may be rationally selected or designed. As used 

herein, an agent is said to be "rationally selected or designed" when the agent is 
chosen based on the configuration of the particular protein. For example, one 
skilled in the art can readily adapt currently available procedures to generate 
peptides, pharmaceutical agents and the like capable of binding to a specific 

25 peptide sequence in order to generate rationally designed antipeptide ligands, for 

example see Hurby et al. n Application of Synthetic Peptides: Antisense Peptides, 
In Synthetic Peptides, A User's Guide, W.H. Freeman, NY (1992), pp. 289-307, 
and Kaspczak et aL Biochemistry 25:9230-8 (1989). 
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In addition to the foregoing, one class of agents of the present invention, 
can be used to control gene expression through binding to one of the ORF 
encoding regions or EMFs of the present invention. As described above, such 
agents can be randomly screened or rationally designed and selected. Targeting 

5 the encoding region or EMF allows a skilled artisan to design sequence specific 

or element specific agents, modulating the expression of either a single ORF 
encoding region or multiple encoding regions that rely on the same EMF for 
expression control. 

One class of DNA binding agents are those that contain nucleotide base 

10 residues that hybridize or form a triple helix by binding to DNA or RNA. Such 

agents can be based on the classic phosphodiester, ribonucleic acid backbone, or 
can be a variety of sulfhydryl or polymeric derivatives having base attachment 
capacity. 

Agents suitable for use in these methods usually contain 20 to 40 bases 
15 and are designed to be complementary to a region of the gene involved in 

transcription (triple helix - see Lee et ai 9 Nucl. Acids Res. 6:3073 (1979); Cooney 
et al., Science 241:456 (1988); and Dervan el aL, Science 251: 1360 (1991)) or 
to the mRNA itself (antisense - Okano, J. Neurochem. 56:560 (1991); 
Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, 
20 Boca Raton, FL (1988)). Triple helix-formation optimally results in a shut-off of 

RNA transcription from DNA, while antisense RNA hybridization blocks 
translation of an mRNA molecule into polypeptide. Both techniques have been 
demonstrated to be effective in model systems. Information contained in the 
sequences of the present invention is necessary for the design of an antisense or 
25 triple helix oligonucleotide and other DNA binding agents. 

Computer Related Embodiments 

The nucleotide sequence provided in SEQ ID NOs: 1 through 142, 
representative fragments thereof, or nucleotide sequences at least 99.9% identical 
to the sequences provided in SEQ ID NOs: 1 through 142, can be "provided" in 
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a variety of media to facilitate use thereof. As used herein, "provided" refers to 
a manufacture, other than an isolated nucleic acid molecule, that contains a 
nucleotide sequence of the present invention, i.e., the nucleotide sequence 
provided in SEQ ID NOs: 1 through 142. a representative fragment thereof, or a 
5 nucleotide sequence at least 99.9% identical to SEQ ID NOs: 1 through 142. 

Such a manufacture provides the E. coli J96 PAI subgenomes or a subset thereof 
(e.g., one or more E. coli J96 PAI open reading frame (ORF)) in a form that 
allows a skilled artisan to examine the manufacture using means not directly 
applicable to examining the E. coli J96 PAI subgenome or a subset thereof as it 

10 exists in nature or in purified form. 

In one application of this embodiment, one or more nucleotide sequences 
of the present invention can be recorded on computer readable media. As used 
herein, "computer readable media" refers to any medium that can be read and 
accessed directly by a computer. Such media include, but are not limited to: 

15 magnetic storage media, such as floppy discs, hard disc storage medium, and 

magnetic tape; optical storage media such as CD-ROM; electrical storage media 
such as RAM and ROM; and hybrids of these categories such as magnetic/optical 
storage media. A skilled artisan can readily appreciate how any of the presently 
known computer readable mediums can be used to create a manufacture 

20 comprising computer readable medium having recorded thereon a nucleotide 

sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable medium. A skilled artisan can readily adopt any of the 
presently know methods for recording information on computer readable medium 

25 to generate manufactures comprising the nucleotide sequence information of the 

present invention. A variety of data storage structures are available to a skilled 
artisan for creating a computer readable medium having recorded thereon a 
nucleotide sequence of the present invention. The choice of the data storage 
structure will generally be based on the means chosen to access the stored 

30 information. In addition, a variety of data processor programs and formats can 

be used to store the nucleotide sequence information of the present invention on 
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computer readable medium. The sequence information can be represented in a 
word processing text file, formatted in commercially-available software such as 
WordPerfect and MicroSoft Word, or represented in the form of an ASCII file, 
stored in a database application, such as DB2, Sybase, Oracle, or the like. A 
skilled artisan can readily adapt any number of dataprocessor structuring formats 
(e.g. text file or database) in order to obtain computer readable medium having 
recorded thereon the nucleotide sequence information of the present invention. 

By providing the nucleotide sequence of SEQ ID NOs: 1 through 142, 
representative fragments thereof, or nucleotide sequences at least 99.9% identical 
to SEQ ID NOs: 1 through 142, in computer readable form, a skilled artisan can 
routinely access the sequence information for a variety of purposes. Computer 
software is publicly available which allows a skilled artisan to access sequence 
information provided in a computer readable medium. The examples which 
follow demonstrate how software which implements the BLAST (Altschul et al, 
J. Mol. Biol. 275:403-410 (1990)) and BLAZE (Brutlag et al, Comp. Chem. 
1 7:203-207 (1993)) search algorithms on a Sybase system can be used to identify 
open reading frames (ORFs) within the E. coli J96 PAI subgenome that contain 
homology to ORFs or proteins from other organisms. Such ORFs are protein- 
encoding fragments within the E. coli J96 PAI subgenome and are useful in 
producing commercially important proteins such as enzymes used in modifying 
surface O-antigens of bacteria. A comprehensive list of ORFs encoding 
commercially important E. coli J96 PAI proteins is provided in Tables 1 through 
6. 

The present invention provides a DNA sequence - gene database of 
pathogenicity islands (PAIs) for E. coli involved in infectious diseases. This 
database is useful for identifying and characterizing the basic functions of new 
virulence genes for E. coli involved in uropathogenic and extraintestinal diseases. 
The database provides a number of novel open reading frames that can be 
selected for further study as described herein. 

Selectable insertion mutations in plasmid subclones encoding PAI genes 
with potentially significant phenotypes for E coli uropathogenesis and sepsis can 
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be isolated. The mutations are then crossed back into wild type, uropathogenic 
E. coli by homologous recombination to create wild-type strains specifically 
altered in the targeted gene. The significance of the genes to E. coli pathogenesis 
is assessed by in vitro assays and in vivo murine models of sepsis/peritonitis and 
5 ascending urinary tract infection. 

New virulence genes and PAI sites in uropathogenic E. coli may be 
identified by the transposon signature-tagged mutagenesis system and negative 
selection of E. coli mutants avirulent in murine models of ascending urinary tract 
infection or peritonitis. 

10 Epidemiological investigations of new virulence genes and PAls may be 

used to test for their occurrence in the genomes of other pathogenic and 
opportunistic members of the Enterobacteriaceae. 

One can choose from the ORFs included in SEQ ID NOs: 1 through 142, 
using Tables 1 through 6 as a useful guidepost for selecting, as candidates for 

1 5 targeted mutagenesis, a limited number of candidate genes within the PAIs based 

on their homology to virulence, export or regulation genes in other pathogens. 
For the large number of apparent genes within the PAIs that do not share 
sequence similarity to any entries in the database, the transposon signature-tagged 
mutagenesis method developed by David Holden's laboratory can be employed 

20 as an independent means of virulence gene identification. 

Allelic knock-outs are constructed using different /?/>-dependent suicide 
vectors (Swihart, K.A. and R.A. Welch, Infect. Immun. 55:1853-1869 (1990)). 
In addition, two different animal model systems can be employed for assessment 
of pathogenic determinants. The initial identification of £. coli hemolysin as a 

25 virulence factor came from the construction of isogenic E. coli strains that were 

tested in a rat model of intra-abdominal sepsis (Welch, R.A. et a!.. Nature 
(London) 294:665-667 (1981)). The ascending UT1 (Urinary Tract Infection) 
mouse model was also successfully performed with allelic knock-outs of the 
hpmA hemolysin of Proteus mirahilis (Swihart, K.A. and R.A. Welch, Infect. 

30 Immun. 55:1853-1869 (1990)). 
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The present invention further provides systems, particularly computer- 
based systems, which contain the sequence information described herein. Such 
systems are designed to identify commercially important fragments of the E. coli 
J96 PAI subgenome. As used herein, "a computer-based system" refers to the 
hardware means, software means, and data storage means used to analyze the 
nucleotide sequence information of the present invention. The minimum 
hardware means of the computer-based systems of the present invention 
comprises a central processing unit (CPU), input means, output means, and data 
storage means. A skilled artisan can readily appreciate that any one of the 
currently available computer-based systems are suitable for use in the present 
invention. 

As indicated above, the computer-based systems of the present invention 
comprise a data storage means having stored therein a nucleotide sequence of the 
present invention and the necessary hardware means and software means for 
supporting and implementing a search means. As used herein, "data storage 
means" refers to memory that can store nucleotide sequence information of the 
present invention, or a memory access means which can access manufactures 
having recorded thereon the nucleotide sequence information of the present 
invention. As used herein, "search means" refers to one or more programs which 
are implemented on the computer-based system to compare a target sequence or 
target structural motif with the sequence information stored within the data 
storage means. Search means are used to identify fragments or regions of the E. 
coli genome that match a particular target sequence or target motif. A variety of 
known algorithms are disclosed publicly and a variety of commercially available 
software for conducting search means are available and can be used in the 
computer-based systems of the present invention. Examples of such software 
include, but are not limited to, MacPattern (EMBL), BLASTN and BLASTX 
(NCBIA). A skilled artisan can readily recognize that any one of the available 
algorithms or implementing software packages for conducting homology searches 
can be adapted for use in the present computer-based systems. 
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As used herein, a "target sequence" can be any DNA or amino acid 
sequence of six or more nucleotides or two or more amino acids. A skilled 
artisan can readily recognize that the longer a target sequence is, the less likely 
a target sequence will be present as a random occurrence in the database. The 
most preferred sequence length of a target sequence is from about 10 to 100 
amino acids or from about 30 to 300 nucleotide residues. However, it is well 
recognized that during searches for commercially important fragments of the E. 
coli J96 PAI subgenome, such as sequence fragments involved in gene expression 
and protein processing, may be of shorter length. 

As used herein, M a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the 
sequence(s) are chosen based on a three-dimensional configuration which is 
formed upon the folding of the target motif. There are a variety of target motifs 
known in the art. Protein target motifs include, but are not limited to, enzymic 
active sites and signal sequences. Nucleic acid target motifs include, but are not 
limited to, promoter sequences, hairpin structures and inducible expression 
elements (protein binding sequences). 

Thus, the present invention further provides an input means for receiving 
a target sequence, a data storage means for storing the target sequence and the 
homologous E. coli J96 PAI sequence identified using a search means as 
described above, and an output means for outputting the identified homologous 
E. coli 396 PAI sequence. A variety of structural formats for the input and output 
means can be used to input and output information in the computer-based systems 
of the present invention. A preferred format for an output means ranks fragments 
of the E. coli J96 PAI subgenome possessing varying degrees of homology to the 
target sequence or target motif Such presentation provides a skilled artisan with 
a ranking of sequences which contain various amounts of the target sequence or 
target motif and identifies the degree of homology contained in the identified 
fragment. 

A variety of comparing means can be used to compare a target sequence 
or target motif with the data storage means to identify sequence fragments of the 
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E. coli J96 PAI subgenomes. For example, implementing software which 
implement the BLAST and BLAZE algorithms (Altschul el al, J. Mol. Biol. 
273:403-410 (1990)) can be used to identify open reading frames within the E. 
coli J96 PAI subgenome A skilled artisan can readily recognize that any one of 
5 the publicly available homology search programs can be used as the search means 

for the computer-based systems of the present invention. 

One application of this embodiment is provided in Figure 2. Figure 2 
provides a block diagram of a computer system 102 that can be used to 
implement the present invention. The computer system 1 02 includes a processor 
10 1 06 connected to a bus 1 04. Also connected to the bus 1 04 are a main memory 

108 (preferably implemented as random access memory, RAM) and a variety of 
secondary storage devices 1 1 0, such as a hard drive 1 1 2 and a removable medium 
storage device 1 14. The removable medium storage device 1 14 may represent, 
for example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. 
15 A removable storage medium 116 (such as a floppy disk, a compact disk, a 

magnetic tape, etc.) containing control logic and/or data recorded therein may be 
inserted into the removable medium storage device 114. The computer system 
102 includes appropriate software for reading the control logic and/or the data 
from the removable medium storage device 1 14 once inserted in the removable 
20 medium storage device 114. 

A nucleotide sequence of the present invention may be stored in a well 
known manner in the main memory 108, any of the secondary storage devices 
110, and/or a removable storage medium 116. Software for accessing and 
processing the genomic sequence (such as search tools, comparing tools, etc.) 
25 reside in main memory 1 08 during execution. 

Having generally described the invention, the same will be more readily 
understood by reference to the following examples, which are provided by way 
of illustration and are not intended as limiting. 
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Experimental 

Example 1: High Through-put Sequencing of Cosmid Clones Covering PA J IV 
andPAI V in E. coli J96 

The complete DNA sequence of the pathogenicity islands, PAI IV and 
PAI V (respectively >] 70 kb and ~1 10 kb), from uropathogenic E. coli strain, J96 
(04:K6) was determined using a strategy, cloning and sequencing method, data 
collection and assembly software essentially identical to those used by the TIGR 
group for determining the sequence of the Haemophilus influenzae genome 
(Fleischmann, R.D., et aL, Science 269:496 (1995)). The sequences were then 
used for DNA and protein sequence similarity searches of the databases as 
described in Fleischmann, Id. 

The analysis of the genetic information found within the PAIs of E. coli 
J96 was facilitated by the use of overlapping cosmid clones possessing these 
unique segments of DNA. These cosmid clones were previously constructed and 
mapped (as further described below) as an overlapping set in the laboratory of Dr. 
Doug Berg (Washington University). A gap exists between the left portion of 
cosmid 2 and the end of the PAI IV that would represent the pheV junction to the 
E. coli K-12 genome. 

Uropathogenic strain E. coli J96 (04:K6) was used as a source of 
chromosomal DNA for construction of a cosmid library. E. coli K-12 DH5a and 
DH12 (Gibco/BRL, Gaithersburg, Md.) were used as hosts for maintaining 
cosmid and plasmid clones. The cosmid library of E coli J96 DNA was 
constructed essentially as described by Bukanow & Berg (Mol. Microbiol 77:509- 
523 (1994)). DNA was digested with Saul Al under conditions that generated 
fragments with an average size of 40 to 50 kb and elcctrophoresed through 1% 
agarose gels. Fragments of 35 to 50 kb were isolated and cloned into Lorist 6 
vector that had been linearized with B a mill and treated with bacterial alkaline 
phosphatase to block self-ligation. (Lorist 6 is a 5.2-kb moderate-copy-number 
cosmid vector with T7 and SP6 promoters close to the cloning site.) Cloned 
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DNA was packaged in lambda phage particles in vitro by using a commercial kit 
(Amersham, Arlington Heights, IL) and cosmid-containing phage particles were 
used to transduce E. coli DH5a. Transductant colonies were transferred to 150 
jj.L of Luria-Bertani broth supplemented with kanamycin in 96-welI microtiter 
5 plates and grown overnight at 37°C with shaking. Two sets of clones, one for 

each PAI were ultimately assembled, as previously described (Swenson et al.. 
Infection and Immunity 6-/:3736-3743 (1996)), fully incorporated by reference 
herein). 

The two sets of clones contain eleven sub-clones that were employed in 
10 the sequencing method described below. One set of four overlapping cosmid 

clones covers the prs-containing PAI V, ATCC Deposit No. 97727, deposited 
September 23, 1996. A second set of seven subclones covers much of the pop- 
containing PAI V, ATCC Deposit No. 97726, deposited September 23, 1996. See 
Figure 1. 

15 A high through-put, random sequencing method (Fleischmann et al, 

Science 269:496 (1995); Fraser et al, Science 270:397 (1995)) was used to obtain 
the sequences for 142 (contigs) fragments of E. coli J96 PAIs. All clones were 
sequenced from both ends to aid in the eventual ordering of contigs during the 
sequence assembly process. Briefly, random libraries of ~ 2 kb clones covering 

20 the two J96 PAIs were constructed, ~ 2,800 clones were subjected to automated 

sequencing (~ 450 nt/clone) and preliminary assemblies of the sequences 
accomplished which result in 142 contigs for each of the two PAIs that total 95 
and 135 kb respectively. The estimated sizes of the PAI IV and PAI V based on 
the overlapping cosmid clones are 1 .7 X 1 0 5 and 1 . 1 X 1 0 5 bp respectively. The 

25 142 sequences were assembled by means of the TIGR Assembler (Fleischmann 

et al: Fraser et al)\ Sutton et al, Genome Sci. Tech. 7:9 (1995)). Sequence and 
physical gaps were closed using a combination of strategies (Fleischmann et al. : 
Fraser et al). Presently the average depth of sequencing for each base assembled 
in the contigs is 6-fold. The tentative identity of many genes based on sequence 

30 homology is covered in Tables 1, 3, 5 and 6. 
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Open reading frames (ORFs) and predicted protein-coding regions were 
identified as described (Fleischmann et al \ Fraser et al.) with some modification. 
In particular, the statistical prediction of uropathogenic E. coli J96 pathogenicity 
island genes was performed with GeneMark (Borodovsky, M. & Mclninch, J. 
5 Comput. Chem. 7 7:123 (1993 )). Regular GeneMark uses nonhomogeneous 

Markov models derived from a training set of coding sequences and ordinary 
Markov models derived from a training set of noncoding sequences. The ORFs 
in Tables 1-6 were identified by GeneMark using a second-order Markov model 
trained from known E. coli coding regions and known E. coli non-coding regions. 

10 Among the important genes that are implicated in the virulence of E. coli 

J96 PAIs are adhesins, excretion pathway proteins, proteins that participate in 
alterations of the O-antigen in the PAIs, cytotoxins, and two-component 
(membrane sensor/DNA binding) proteins. 

/. Adhesins. It is believed that the principal adhesin determinants 

1 5 involved in uropathogenicity that are present within PAIs of uropathogenic £. coli 

are the pili encoded by the /><7/?-reIated operons (Hultgren et aL, Infect. Immun. 
50:370-377 (1993), Stromberg et al y EMBO 7 9:2001-2010 (1990), High et aL, 
Infect. Immun. 56:513-517 (1988)) and the distantly related afimbrial adhesins 
(Labigne-Roussel et aL, Infect. Immun. 4(5:251-259 (1988)). The presence of two 

20 of these {pap, and prs) has been confirmed. In addition potential genes for five 

other adhesins including sla (described above), AIDA-I (diffuse adherence- 
DEAC), hra (heat resistant hemagglutinin-ETEC), fha (filamentous 
hemagglutinin- Bordetella pertussis) and the arg-gingipain proteinase of 
Porphyromonas gingivalis have been found. 

25 //. Type II exoprotein secretion pathway. Highly significant 

statistics support the presence of multiple genes involved in the type II exoprotein 
pathway. Curiously, perhaps two different determinants appear to be present in 
PA1 IV where one set of genes has the highest sequence similarity to eps-Y\ke 
genes (Vibrio cholerae Ctx export) and the other has greatest similarity to exe 

30 genes (Aeromonas hydophilia aerolysin and protease export). At present, the 

assembly of contigs involving these potential genes is incomplete. Thus, it is 



NSDOCID <WO._ 9822575A2 I > 



WO 98/22575 



PCT/US97/21347 



-49- 



10 



15 



20 



25 



30 



uncertain if two separate and complete determmants are present. However, it .s 
clear that these genes are newly discovered and novel to pathogeny t. coh 
because the derived sequences do not have either the bfp or hop genes as the 
hlg hest matches. The gene products that are the target of the type 11 export 
pathway are not evident at this time. 

Within PAI IV there are sequences whieh suggest genes very similar to 
secD and secF. These two linked genes encode homologous products that are 
locahzed to the inner membrane and are hypothesized to play a late role in the 
translocation of leader-peptide containing proteins across the inner membrane of 
gram-negative bacteria. In addition, in each PAI, sequences are found that are 
reminiscent of the heat-shock htrA/degA gene that encodes a piroplasmic 
protease. They may perform endochaperone-hke function as Pugsley et al have 
hypothesized for different exoprotein pathways. 

///. O-antigen/capsule/carbohydrate modification (Nod genes). J96 
has the 04. The O-antigen portion of lipopolysaccharide is encoded by rfb genes 
that are located at 45 nun. on the * coli chromosome. We have found in both 
PAIS a cumulative total of five possible r/Z>-like genes which could participate 
alterations of the O-antigen in the PAI, Overall these data suggest that PAIs 
provide the genetic potential for greater change of the cell surface for 
uropathogenic E. coli strains than what was previously known. 

The apparent capsule type for strain J96 is a non-sialic acid K6-ty P e. 
Sequence similarity "hits" were made in PAI IV region to two re g ion-l capsule 
g enes,^5and^£: involved in the stabilization of polysaccharide synthesis and 
polysaccharide export across the inner membrane. This is not altogether 
surprising based on the genetic mapping of the kps locus to serA at 63 minutes on 
the genome of the Kl capsular type of E. coli. This suggests that these kpM* 
genes either are participating in the K6-biosynthesis or perhaps are involved in 
complex carbohvdrate export for other purposes. 

An intriguing d 1S covery are the hits made on genes involved in bactena- 
plant interactions by Rhizobium, Bradyrhizobium and AgrobaCeriun, Four 
potential genes identified thus far share significant sequence similarity to genes 
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encoding products thai modify lipo-oligosaccharides that influence nodule 
morphogenesis on legume roots. These are: ORF140, carbamyl phosphate 
synthetase; nodulation protein 1265; phosphate-regulatory protein; and an ORF 
at a plant-inducible locus in Agrobacterium. To date there are no descriptions in 
5 the literature of such gene products being utilized by human or animal bacterial 

pathogens for the purposes of modification or secretion of extracellular 
carbohydrate. However, the sequence similarity to the capsular region-2 genes 
and to lipooligosaccharide biosynthetic genes in Rhizobium spp has been recently 
noted by Petit (1995). 

10 IV. Cytotoxins. Besides the previously known hemolysin and CNF 

toxins in the PAIs, in each PAI sequences similar to the shlBA operon (cosmid 5 
and 12) were found for a cytolytic toxin from Serratia marcescens and Proteus 
mirabilis. Ironically, the P. mirabilis hemolysin (HpmA) member of this family 
of toxins was discovered by Uphoff and Welch (1990), but not thought to exist 

15 in other members of the Enterobacteriaceae (Swihart (1990)). A shIB-like 

transporter does also appear to be involved in the export of the filamentous 
hemagglutinin of Bordetella pertussis which was described above and a cell 
surface adhesin of Haemophilus influenzae. It has been demonstrated that cosmid 
#5 of E. coli J96 encodes an extracellular protein that is —180 kDa and cross- 

20 reactive to polyclonal antisera to the P. mirabilis HpmA hemolysin. Thus, there 

is evidence suggesting there is new member of this family of proteins in 
extraintestinal E. coli isolates. In addition, there is also a hit on the FhaC 
hemolysin-like gene within the PAI V although its statistical significance for the 
sequence thus far available is only 0.0043. 

25 V. Regulators. A common regulatory motif in bacteria are the two- 

component (membrane sensor/DNA binding) proteins. In numerous instances in 
pathogenic bacteria, external signals in the environment cause membrane-bound 
protein kinases to phosphorylate a cytoplasmic protein which in turn acts as cither 
a negative or positive effector of transcription of large sets of operons. On 

30 cosmid 1 1 representing PAI V were found, in two different Pstl clones, sequences 
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for two-component regulators (similar probabilit.es for OmpR/ AIGB and 
separately RcsC, probabilities at the 10' 22 level). 

In addition, the phosphoglycerate transport system (pgtA,pgtC, and pgtP) 
.ncluding XhopgtB regulator is present in PAI IV. This transport system whieh 
was originally described in S. typhimurium is not appreciated as a component of 
any pathogenic E. coli genome. The operon had been previously mapped at 49 
minutes, near or within one of the 5. typhimurium chromosome specific-loops not 
present in the K-12 genome. It should be noted that the E. coli K-12 glpT gene 
product is similar to pgtP gene product (37% identity), but the E. coli J96 genes 
are clearly homoiogs to the pgt genes and their linkage within the middle of PAI 
IV element (cosmid #4) is suspicious. 

VI. Mobile genetic elements. There are numerous sequences that 
share similarity to genes found on insertion elements, plasmids and phages. The 
temperate bacteriophage P4 inserts within tRNA loci in the E. coli chromosome. 
1 5 The hypothesis was made that PAIs are the result of bacteriophage P4-virulence 

gene recombination events (Blum et al., Infect Immun. 62:606-614 (1994). Data 
supporting this hypothesis was found during our sequencing with the 
,dentification of P4-like sequences in each of the PAIs (cosmids 7 and 9). This 
is a very important preliminary result which supports the hypothesis that PAIs can 
be identified by common sequence or genetic elements. However, there are 
indications that multiple mobile genetic elements involved in the evolution of the 
J96 PAIs. Conjugal plasmid-related sequences may also be present at two 
different locations (F factor and Rl plasmid). Sequences for multiple transposable 
elements are present that are likely to have originated from different bacterial 
25 genera (TnlOOO, IS630, 1891 1, IS100. IS21, IS 1203, IS5376 (B. 

stearothermophflus) and RHS). Of particular interest is IS100, which was 
originally identified in Yersinia pestis (Fetherston el al, Mol Microbiol. 6:2693- 
2704 (1992)). The presence of IS 100 is significant because it has been associated 
with the termini of a large chromosomal element encoding p.gmentation and 
some aspect of virulence in Y pestis. This element undergoes spontaneous 
deletions similar to the PAIs from E. coli 536 (Fetherston et al, Mol. Microbiol 
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6:2693-2704 (1992)) and appears to participate in plasmid-chromosome 
rearrangements. This element was not previously known to be in genera outside 
of Yersinia. 

The discovery of the apparent att site for bacteriophage P2 in the PAIs is 
5 interesting. P2 acts as a helper phage for the P4 satellite phage. The P2 att site 

is at 44 min in the K-12 genome. The significance of this hit is unknown at 
present, but may be explained as either a cloning artifact (some K-12 fragments 
in the Pst 1 library of cosmid 5) or evidence of some curious chromosomal-P4/ 
P2 phage history. It may indicate that the J96 PAIs are composites of multiple 
10 smaller PAIs. 



Example 2: Preparation of PCR Primers and Amplification ofDNA 

Various fragments of the sequenced E. coli J96 PAIs, such as those 
disclosed in Tables 1 through 6 can be used, in accordance with the present 
invention, to prepare PCR primers. The PCR primers are preferably at least 15 
1 5 bases, and more preferably at least 1 8 bases in length. When selecting a primer 

sequence, it is preferred that the primer pairs have approximately the same G/C 
ratio, so that melting temperatures are approximately the same. The PCR primers 
are useful during PCR cloning of the ORFs described herein. 



Example 3: Gene expression from DNA Sequences Corresponding to ORFs 

20 A fragment of an E. coli J96 PAI (preferably, a protein-encoding sequence 

provided in Tables 1 through 6) is introduced into an expression vector using 
conventional technology (techniques to transfer cloned sequences into expression 
vectors that direct protein translation in mammalian, yeast, insect or bacterial 
expression systems are well known in the art) Commercially available vectors 

25 and expression systems are available from a variety of suppliers including 

Stratagene (La Jolla, California), Promega (Madison, Wisconsin), and Invitrogen 
(San Diego, California). If desired, to enhance expression and facilitate proper 
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protein folding, the codon context and codon pairing of the sequence may be 
optimized for the particular expression organism, as explained by Hatfield et al, 
U.S. Pat. No. 5,082,767, which is hereby incorporated by reference. 

The following is provided as one exemplary method to generate 
5 polypeptide(s) from a cloned ORF of an E. coli J96 PA1 whose sequence ,s 

provided in SEQ ID NOs: 1 through 142. A poly A sequence can be added to the 
construct by, for example, splicing out the poly A sequence from P SG5 
(Stratagene) using Bgll and Sail restriction endonuclease enzymes and 
incorporating it into the mammalian expression vector pXTl (Stratagene) for use 
10 in eukaryotic expression systems. pXTl contains the LTRs and a portion of the 

gag gene from Moloney Murine Leukemia Virus. The position of the LTRs in 
the construct allow efficient stable transfection. The vector includes the Herpes 
Simplex thymidine kinase promoter and the selectable neomycin gene. The E. 
coli J96 PA1 DNA is obtained by PCR from the bacterial vector using 
oligonucleotide primers complementary to the E. coli J96 PAI DNA and 
containing restriction endonuclease sequences for PstI incorporated into the 5' 
primer and Bglll at the 5' end of the corresponding E. coli J96 PAI DNA 3' 
primer, taking care to ensure that the E. coli J96 PAI DNA is positioned such that 
its followed with the poly A sequence. The purified fragment obtained from the 
resulting PCR reaction is digested with PstI, blunt ended with an exonuclease, 
digested with Bglll, purified and ligated to pXTl, now containing a poly A 
sequence and digested Bglll. 

The ligated product is transfected into mouse NIH 3T3 cells using 
Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions 
25 outlined in the product specification. Positive transfectants are selected after 

growing the transfected cells in 600 ug/ml G418 (Sigma, St. Louis, Missouri). 
The protein is preferably released into the supernatant. However if the protein 
has membrane binding domains, the protein may additionally be retained within 
the cell or expression may be restricted to the cell surface. 
30 Since it may be necessary to purify and locate the transfected product, 

synthetic 15-mer peptides synthesized from the predicted E. coli J96 PAI DNA 
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scqucnce are injected into mice to generate antibod>' to the polypeptide encoded 
by the E. coli J96 PA1 DNA. 

If antibody production is not possible, the E. coli J96 PAI DNA sequence 
is additionally incorporated into eukaryotic expression vectors and expressed as 
5 a chimeric with, for example, B-globin. Antibody to B-globin is used to purify the 

chimeric. Corresponding protease cleavage sites engineered between the B-globin 
gene and the E. coli J96 PAI DNA are then used to separate the two polypeptide 
fragments from one another after translation. One useful expression vector for 
generating B-globin chimerics is pSG5 (Stratagene). This vector encodes rabbit 

10 B-globin. lntron II of the rabbit B-globin gene facilitates splicing of the expressed 

transcript, and the polyadenylation signal incorporated into the construct increases 
the level of expression. These techniques as described are well known to those 
skilled in the art of molecular biology. Standard methods are available from the 
technical assistance representatives from Stratagene, Life Technologies, Inc., or 

1 5 Promega. Polypeptides may additionally be produced from either construct using 

in vitro translation systems such as In vitro Express™ Translation Kit 
(Stratagene). 

Example 4 

£*• coli Expression of an E. coli J96 PAI ORF and protein purification 

An E. coli J96 PAI ORF described in Tables 1 through 6 is selected and 
amplified using PCR oligonucleotide primers designed from the nucleotide 
sequences flanking the selected ORF and/or from portions of the ORF's NH 2 - or 
COOH-terminus. Additional nucleotides containing restriction sites to facilitate 
cloning are added to the 5' and 3' sequences, respectively. 

The restriction sites are selected to be convenient to restriction sites in the 
bacteria] expression vector pQE60. The bacterial expression vector pQE60 is 
used for bacterial expression in this example. (Q1AGEN, Inc.. 9259 Eton Avenue, 
Chatsworth, CA, 91311). pQE60 encodes ampiciliin antibiotic resistance 
("Ampr") and contains a bacterial origin of replication ("ori"), an IPTG inducible 



20 



25 
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promoter, a ribosome binding site ("RBS"), six codons encoding histidine 
residues that allow affinity purification using nickel-nitrilo-tri-acetic acid ("Ni- 
NTA") affinity resin sold by Q1AGEN, Inc., supra, and suitable single restriction 
enzyme cleavage sites. These elements are arranged such that a DNA fragment 
encoding a polypeptide may be inserted in such as way as to produce that 
polypeptide with the six His residues (i.e., a n 6 X His tag") covalently linked to 
the carboxyl terminus of that polypeptide. 

The DNA sequence encoding the desired portion of an E. coli J96 PAI is 
amplified from the deposited cDNA clone using PCR oligonucleotide primers 
which anneal to the amino terminal sequences of the desired portion of the E. coli 
protein and to sequences in the deposited construct 3' to the cDNA coding 
sequence. Additional nucleotides containing restriction sites to facilitate cloning 
in the pQE60 vector are added to the 5' and 3' sequences, respectively. 

The amplified E. coli J96 PAI DNA fragments and the vector pQE60 are 
digested with one or more appropriate restriction enzymes, such as Sail and Xbal, 
and the digested DNAs are then ligated together. Insertion of the E. coli J96 PAI 
DNA into the restricted pQE60 vector places the E. coli J96 PAI protein coding 
region, including its associated stop codon, downstream from the IPTG-inducible 
promoter and in-frame with an initiating AUG. The associated stop codon 
prevents translation of the six histidine codons downstream of the insertion point. 

The ligation mixture is transformed into competent E. coli cells using 
standard procedures such as those described in Sambrook et al, Molecular 
Cloning: a Laboratory Manual 2nd Ed.; Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, NY (1989). E. coli strain M15/rep4, containing multiple 
copies of the plasmid pREP4, which expresses the lac repressor and confers 
kanamycin resistance ("Kanr"), is used in carrying out the illustrative example 
described herein. This strain, which is only one of many that are suitable for 
expressing an E. coli J96 PAI protein, is available commercially from QIAGEN, 
Inc., supra Transformants are identified by their ability to grow on LB plates in 
the presence of ampicillin and kanamycin. Plasmid DNA is isolated from 
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rcsistant colonies and the identity of the cloned DNA confirmed by restriction 
analysis, PCR and DNA sequencing. 

Clones containing the desired constructs are grown overnight ("O/N") in 
liquid culture in LB media supplemented with both ampicillin (100 jig/ml) and 
5 kanamycin (25 |ig/ml). The O/N culture is used to inoculate a large culture, at a 

dilution of approximately 1 :25 to 1 :250. The cells are grown to an optical density 
at 600 nm ("OD600") of between 0.4 and 0.6. isopropyl-p-D- 
thiogalactopyranoside ("IPTG") is then added to a final concentration of 1 mM 
to induce transcription from the lac repressor sensitive promoter, by inactivating 

10 the laci repressor. Cells subsequently are incubated further for 3 to 4 hours. Cells 

then are harvested by centrifugation. 

The cells are then stirred for 3-4 hours at 4°C in 6M guanidine-HCl, pH8. 
The cell debris is removed by centrifugation, and the supernatant containing the 
E. coli J96 PAI protein is dialyzed against 50 mM Na-acetate buffer pH6, 

1 5 supplemented with 200 mM NaCl. Alternatively, the protein can be successfully 

refolded by dialyzing it against 500 mM NaCl, 20% glycerol, 25 mM Tris/HCl 
pH7.4, containing protease inhibitors. After renaturation the protein can be 
purified by ion exchange, hydrophobic interaction and size exclusion 
chromatography. Alternatively, an affinity chromatography step such as an 

20 antibody column can be used to obtain pure E. coli J96 PAI protein. The purified 

protein is stored at 4°C or frozen at -80 °C. 



Example 5 

Cloning and Expression of an E. coli J96 PAI protein in a Baculovirus 

Expression System 

25 A £. coli J96 PAI ORF described in Tables 1 through 6 is selected and 

amplified as above. The plasmid is digested with appropriate restriction enzymes 
and optionally, can be dephosphorylated using calf intestinal phosphatase, using 
routine procedures known in the art. The DNA is then isolated from a 1% 
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agarose gel using a commercially available kit ("Geneclean" BIO 101 Inc., La 
Jolla, Ca.). This vector DNA is designated herein "VI". 

Fragment Fl and the dephosphorylated plasmid VI are ligated together 
with T4 DNA ligase. E. coli HB101 or other suitable E. coli hosts such as XL-1 
Blue (Stratagene Cloning Systems, La Jolla, CA) cells are transformed with the 
ligation mixture and spread on culture plates. Bacteria are identified that contain 
the plasmid with the E. coli J96 PAI gene by digesting DNA from individual 
colonies using appropriate restriction enzymes and then analyzing the digestion 
product by gel electrophoresis. The sequence of the cloned fragment is confirmed 
by DNA sequencing. This plasmid is designated herein pBac E. coliJ96. 

Five \xg of the plasmid pBac E. coli J96 is co-transfected with 1.0 ^g of 
a commercially available linearized baculovirus DNA ("BaculoGold™ 
baculovirus DNA", Pharmingen, San Diego, CA.), using the lipofection method 
described by Feigner et al, Proc. Nad Acad ScL USA 84:74 13-74 17 (1987). 1 
lag of BaculoGold™ virus DNA and 5 ^g of the plasmid pBac E. coli J96 are 
mixed in a sterile well of a microliter plate containing 50 ^1 of serum-free Grace's 
medium (Life Technologies Inc., Gaithersburg, MD). Afterwards, 10 \xl 
Lipofectin plus 90 |al Grace's medium are added, mixed and incubated for 15 
minutes at room temperature. Then the transfection mixture is added drop- wise 
to Sf9 insect cells (ATCC CRL 1711) seeded in a 35 mm tissue culture plate with 
1 ml Grace's medium without serum. The plate is rocked back and forth to mix 
the newly added solution. The plate is then incubated for 5 hours at 27 °C After 
5 hours the transfection solution is removed from the plate and 1 ml of Grace's 
insect medium supplemented with 10% fetal calf serum is added. The plate is put 
back into an incubator and cultivation is continued at 27 °C for four days. 

After four days the supernatant is collected and a plaque assay is. 
performed, as described by Summers and Smith, supra. An agarose gel with 
"Blue Gal" (Life Technologies Inc.) is used to allow easy identification and 
isolation of gal-expressing clones, which produce blue-stained plaques. (A 
detailed description of a "plaque assay" of this type can also be found in the user's 
guide for insect cell culture and baculovirology distributed by Life Technologies 
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Inc, page 9-10). After appropriate incubation, blue stained plaques are picked 
with the tip of a micropipettor (e.g., Eppendorf)- The agar containing the 
recombinant viruses is then resuspended in a microcentrifuge tube containing 200 
|al of Grace's medium and the suspension containing the recombinant baculovirus 
5 is used to infect Sf9 cells seeded in 35 mm dishes. Four days later the 

supernatants of these culture dishes are harvested and then they are stored at 4°C. 
The recombinant virus is called V-£. coli J96. 

To verify the expression of the E. coli gene Sf9 cells are grown in Grace's 
medium supplemented with 10% heat inactivated FBS. The cells are infected 

10 with the recombinant baculovirus V-£. coli J96 at a multiplicity of infection 

("MOI") of about 2. Six hours later the medium is removed and is replaced with 
SF900 II medium minus methionine and cysteine (available from Life 
Technologies Inc.). If radiolabeled proteins are desired, 42 hours later, 5 |^Ci of 
35 S-methionine and 5 [id 35 S-cysteine (available from Amersham) are added. 

15 The cells are further incubated for 16 hours and then they are harvested by 

centrifugation. The proteins in the supernatant as well as the intracellular proteins 
are analyzed by SDS-PAGE followed by autoradiography (if radiolabeled). 
Microsequencing of the amino acid sequence of the amino terminus of purified 
protein may be used to determine the amino terminal sequence of the mature 

20 protein and thus the cleavage point and length of the secretary signal peptide. 

Example 6 
Cloning and Expression in Mammalian Cells 

Most of the vectors used for the transient expression of an E. coli J96 PAI 
gene in mammalian cells should carry the SV40 origin of replication. This allows 
25 the replication of the vector to high copy numbers in cells (e.g., COS cells) which 

express the T antigen required for the initiation of viral DNA synthesis. Any 
other mammalian cell line can also be utilized for this purpose. 

A typical mammalian expression vector contains the promoter element, 
which mediates the initiation of transcription of mRNA, the protein coding 
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sequence, and signals required for the termination of transcription and 
polyadenylation of the transcript. Additional elements include enhancers, Kozak 
sequences and intervening sequences flanked by donor and acceptor sites for 
RN A splicing. Highly efficient transcription can be achieved with the early and 
5 late promoters from SV40, the long terminal repeats (LTRS) from Retroviruses, 

e.g., RSV, 1HTLVI, HIVI and the early promoter of the cytomegalovirus (CMV). 
However, cellular elements can also be used (e.g., the human actin promoter). 
Suitable expression vectors for use in practicing the present invention include, for 
example, vectors such as PSVL and PMSG (Pharmacia, Uppsala, Sweden), 
10 pRSVcat (ATCC 37152), pSV2dhfr (ATCC 37146) and pBC12MI (ATCC 

67109). Mammalian host cells that could be used include, human Hela, 293, H9 
and Jurkat cells, mouse NIH3T3 and CI 27 cells, Cos 1, Cos 7 and CV I, quail 
QC1-3 cells, mouse L cells and Chinese hamster ovary (CHO) cells. 

Alternatively, the gene can be expressed in stable cell lines that contain 
15 the gene integrated into a chromosome. The co-transfection with a selectable 

marker such as dhfr, gpt, neomycin, hygromycin allows the identification and 
isolation of the transfected cells. 

The transfected gene can also be amplified to express large amounts of the 
encoded protein. The DHFR (dihydrofolate reductase) marker is useful to 
20 develop cell lines that cany several hundred or even several thousand copies of 

the gene of interest. Another useful selection marker is the enzyme glutamine 
synthase(GS) (Murphy et al t BiochemJ. 227:277-279(199 1); Bebbingtonef a/., 
Bio/Technology 70:169-175 (1992)). Using these markers, the mammalian cells 
are grown in selective medium and the cells with the highest resistance are 
25 selected. These cell lines contain the amplified gene(s) integrated into a 

chromosome. Chinese hamster ovary (CHO) and NSO cells are often used for the 
production of proteins. 

The expression vectors pCl and pC4 contain the strong promoter (LTR) 
of the Rous Sarcoma Virus (Cullen et al., Molecular and Cellular Biology, 438- 
30 447 (March, 1 985)) plus a fragment of the CMV-enhancer (Boshart et al, Cell 

47:521-530 (1985)). Multiple cloning sites, e.g., with the restriction enzyme 
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cleavage sites BamHI, Xbal and Asp718, facilitate the cloning of the gene of 
interest. The vectors contain in addition the 3' intron, the polyadenylation and 
termination signal of the rat preproinsulin gene. 

Example 6(a): Cloning and Expression in COS Cells 

5 

The expression plasrnid, p E. coli J96HA, is made by cloning a cDNA 
encoding E. coli J96 PAI protein into the expression vector pcDNAI/Amp or 
pcDNAIII (which can be obtained from Invitrogen, Inc.). 

The expression vector pcDNAI/amp contains: (1) an E. coli origin of 

1 0 replication effective for propagation in E. coli and other prokaryotic cells; (2) an 

ampicillin resistance gene for selection of plasmid-containing prokaryotic cells; 
(3) an SV40 origin of replication for propagation in eukaryotic cells; (4) a CMV 
promoter, a polylinker, an SV40 intron; (5) several codons encoding a 
hemagglutinin fragment (i.e., an "HA" tag to facilitate purification) followed by 

1 5 a termination codon and polyadenylation signal arranged so that a cDNA can be 

conveniently placed under expression control of the CMV promoter and operably 
linked to the SV40 intron and the polyadenylation signal by means of restriction 
sites in the polylinker. The HA tag corresponds to an epitope derived from the 
influenza hemagglutinin protein described by Wilson et aL, Cell 37:767 (1984). 

20 The fusion of the HA tag to the target protein allows easy detection and recovery 

of the recombinant protein with an antibody that recognizes the HA epitope. 
pcDNAIII contains, in addition, the selectable neomycin marker. 

A DNA fragment encoding the E. coli J96 PAI protein is cloned into the 
polylinker region of the vector so that recombinant protein expression is directed 

25 by the CMV promoter. The plasrnid construction strategy is as follows. The £. 

coli cDNA of the deposited clone is amplified using primers that contain 
convenient restriction sites, much as described above for construction of vectors 
for expression of E. coli J96 PAI protein in E. coll 

The PCR amplified DNA fragment and the vector, pcDNAI/Amp, are 

30 digested with appropriate restriction enzymes for the chosen primer sequences 
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and then ligated. The ligation mixture is transformed into E. coli strain SURE 
(available from Stratagene Cloning Systems, La Jolla, CA 92037), and the - 
transformed culture is plated on ampicillin media plates which then are incubated 
to allow growth of ampicillin resistant colonies. Plasmid DNA is isolated from 
5 resistant colonies and examined by restriction analysis or other means for the 

presence of the E. coli J96 PAI protein-encoding fragment. 

For expression of recombinant E. coli J96 PAI protein, COS cells are 
transfected with an expression vector, as described above, using DEAE- 
DEXTRAN, as described, for instance, in Sambrook et al r Molecular Cloning: 
10 a Laboratory Manual Cold Spring Laboratory Press, Cold Spring Harbor, New 

York (1989). Cells are incubated under conditions for expression of E. coli J96 
PAI protein by the vector. 

Expression of the E. coli J96 PAI - HA fusion protein is detected by 
radiolabeling and immunoprecipitation, using methods described in, for example 
15 Harlow et al t Antibodies: A Laboratory Manual 2nd Ed ; Cold Spring Harbor 

Laboratory Press, Cold Spring Harbor, New York (1988). To this end, two days 
after transfection, the cells are labeled by incubation in media containing 35 S- 
cysteine for 8 hours. The cells and the media are collected, and the cells are 
washed and the lysed with detergent-containing RIP A buffer: 150 mM NaCl, 1% 
20 NP-40, 0.1% SDS, 1% NP-40, 0.5% DOC, 50 mM TRIS, pH 7.5, as described by 

Wilson et al cited above. Proteins are precipitated from the cell lysate and from 
the culture media using an HA-specific monoclonal antibody. The precipitated 
proteins then are analyzed by SDS-PAGE and autoradiography. An expression 
product of the expected size is seen in the cell lysate, which is not seen in 
25 negative controls. 

Example 6(b): Cloning and Expression in CHO Cells 

The vector pC4 is used for the expression of an E. coli J96 PAI protein. 
Plasmid pC4 is a derivative of the plasmid pSV2-dhfr (ATCC Acc. No. 37146). 
The plasmid contains the mouse DHFR gene under control of the SV40 early 
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promoter. Chinese hamster ovary- or other cells lacking dihydrofolate activity 
that are transfected with these plasmids can be selected by growing the cells in a 
selective medium (alpha minus MEM, Life Technologies, Inc.) supplemented 
with the chemotherapeutic agent methotrexate. The amplification of the DHFR 
5 genes in cells resistant to methotrexate (MTX) has been well documented (see, 

e.g., Alt, F. W. etal, 1978, J. Biol Chem. 253:1357-1370, Hamlin, J. L. and Ma, 
C. 1990, Biochim. et Biophys. Acta, 7097:107-143, Page, M. J. and Sydenham, 
M.A. 1991, Biotechnology 9:64-68). Cells grown in increasing concentrations of 
MTX develop resistance to the drug by overproducing the target enzyme, DHFR, 

10 as a result of amplification of the DHFR gene. If a second gene is linked to the 

DHFR gene, it is usually co-amplified and over-expressed. It is known in the art 
that this approach may be used to develop cell lines carrying more than 1,000 
copies of the amplified gene(s). Subsequently, when the methotrexate is 
withdrawn, cell lines are obtained which contain the amplified gene integrated 

15 into one or more chromosome(s) of the host cell. 

Plasmid pC4 contains for expressing the gene of interest the strong 
promoter of the long terminal repeat (LTR) of the Rouse Sarcoma Virus (Cullen, 
et al, Molecular and Cellular Biology, March 1985:438-447) plus a fragment 
isolated from the enhancer of the immediate early gene of human 

20 cytomegalovirus (CMV) (Boshart et al, Cell 47:521-530 (1985)). Downstream 

of the promoter is BamHI restriction enzyme site that allows the integration of the 
gene. Behind these cloning sites the plasmid contains the 3' intron and 
polyadenylation site of the rat preproinsulin gene. Other high efficiency 
promoters can also be used for the expression, e.g., the human p-actin promoter, 

25 the SV40 early or late promoters or the long terminal repeats from other 

retroviruses, e.g., HIV and HTLVI. Clontech's Tet-Off and Tet-On gene 
expression systems and similar systems can be used to express the E. coli protein 
in a regulated way in mammalian cells (Gossen, M., & Bujard, H. 1992, Proc. 
Natl Acad Sci. USA 89: 5547-5551). For the polyadenylation of the mRNA 

30 other signals, e.g., from the human growth hormone or globin genes can be used 

as well. Stable cell lines carrying a gene of interest integrated into the 
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chromosomes can also be selected upon co-transfection with a selectable marker 
such as gpt, G418 or hygromycin. It is advantageous to use more than one 
selectable marker in the beginning, e.g., G418 plus methotrexate. 

The plasmid pC4 is digested with appropriate restriction enzymes and 
then dephosphorylated using calf intestinal phosphates by procedures known in 
the art. The vector is then isolated from a 1% agarose gel. 

The DNA sequence encoding the complete E. colt 396 PAI protein 

including its leader sequence is amplified using PCR oligonucleotide primers 

corresponding to the 5' and 3' sequences of the gene. 

The amplified fragment is digested with appropriate endonucleases for the 

chosen primers and then purified again on a 1% agarose gel. The isolated 

fragment and the dephosphorylated vector are then ligated with T4 DNA ligase. 

E. coli HB101 or XL-1 Blue cells are then transformed and bacteria are identified 

that contain the fragment inserted into plasmid pC4 using, for instance, restriction 

enzyme analysis. 

Chinese hamster ovary cells lacking an active DHFR gene are used for 
transfection. 5 ug of the expression plasmid pC4 is cotransfected with 0.5 ug of 
the plasmid pSVneo using lipofectin (Feigner et al, supra). The plasmid pSV2- 
neo contains a dominant selectable marker, the neo gene from Tn5 encoding an 
enzyme that confers resistance to a group of antibiotics including G4 1 8. The cells 
are seeded in alpha minus MEM supplemented with 1 mg/ml G41 8. After 2 days, 
the cells are trypsinized and seeded in hybridoma cloning plates (Greiner, 
Germany) in alpha minus MEM supplemented with 10, 25, or 50 ng/ml of 
methothrexate plus 1 mg/ml G418. After about 10-14 days single clones are 
trypsinized and then seeded m 6-well petri dishes or 10 ml flasks using different 
concentrations of methotrexate (50 nM, 100 nM, 200 nM, 400 nm, 800 nM). 
Clones growing at the highest concentrations of methotrexate are then transferred 
to new 6-well plates containing even higher concentrations of methotrexate 
(1 uM, 2 uM, 5 uM, 10 mM, 20 mM). The same procedure is repeated until 
clones are obtained which grow at a concentration of 1 00 - 200 uM. Expression 
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of the desired gene product is analyzed, for instance, by SDS-PAGE and Western 
blot or by reversed phase HPLC analysis. 



Example 7 

Production of an Antibody to an E. coli J96 Pathogenicity Island Protein 

5 Substantially pure E. coli J96 PAI protein or polypeptide is isolated from 

the transfected or transformed cells described above using an art-known method. 
The protein can also be chemically synthesized. Concentration of protein in the 
final preparation is adjusted, for example, by concentration on an Amicon filter 
device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody 

10 to the protein can then be prepared as follows: 

Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes of any of the peptides identified and 
isolated as described can be prepared from murine hybridomas according to the 
classical method of Kohler and Milstein, Nature 256:495 (1975) or modifications 

15 of the methods thereof. Briefly, a mouse is repetitively inoculated with a few 

micrograms of the selected protein over a period of a few weeks. The mouse is 
then sacrificed, and the antibody producing cells of the spleen isolated. The 
spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, 
and the excess unfiised cells destroyed by growth of the system on selective 

20 media comprising aminopterin (HAT media). The successfully fused cells are 

diluted and aliquots of the dilution placed in wells of a microtiter plate where 
growth of the culture is continued. Antibody-producing clones are identified by 
detection of antibody in the supernatant fluid of the wells by immunoassay 
procedures, such as ELISA, as originally described by Engvall, E., Meth. 

25 Enzymol 70:419 (1980), and modified methods thereof. Selected positive clones 

can be expanded and their monoclonal antibody product harvested for use. 
Detailed procedures for monoclonal antibody production are described in Davis, 
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L. et al Basic Methods in Molecular Biology Elsevier, New York. Section 21-2 
(1989). 



Polyclonal Antibody Production by Immunization 



Polyclonal antiserum containing antibodies to heterogenous epitopes of 
5 a single protein can be prepared by immunizing suitable animals with the 

expressed protein described above, which can be unmodified or modified to 
enhance immunogenicity. Effective polyclonal antibody production is affected 
by many factors related both to the antigen and the host species. For example, 
small molecules tend to be less immunogenic than other molecules and may 
10 require the use of carriers and adjuvant. Also, host animals vary in response to 

site of inoculations and dose, with both inadequate or excessive doses of antigen 
resulting in low titer antisera. Small doses (ng level) of antigen administered at 
multiple intradermal sites appears to be most reliable. An effective immunization 
protocol for rabbits can be found in Vaitukaitis, J. et al, J, Clin. Endocrinol 
15 Metab. 35:988-991 (1971). 

Booster injections can be given at regular intervals, and antiserum 
harvested when antibody titer thereof, as determined semi-quantitatively, for 
example, by double immunodiffusion in agar against known concentrations of the 
antigen, begins to fall (See Ouchterlony, O. et al, Chap. 19 in: Handbook of 
20 Experimental Immunology, Wier, D., ed, Blackwell (1973)). Plateau 

concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum 
(about 12 ^M). Affinity of the antisera for the antigen is determined by preparing 
competitive binding curves, as described, for example, by Fisher, D., Chap. 42 in: 
Manual of Clinical Immunology, 2nd ed., Rose and Friedman, (eds.), Amer. Soc. 
25 For Microbio., Washington, D.C. (1980). 

Antibody preparations prepared according to either protocol are useful in 
quantitative immunoassays which determine concentrations of antigen-bearing 
substances in biological samples; they are also used semi-quantitatively or 
qualitatively to identify the presence of antigen in a biological sample. 
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While the present invention has been described in some detail for 
purposes of clarity and understanding, one skilled in the art will appreciate that 
various changes in form and detail can be made without departing from the true 
scope of the invention. 
5 All patents, patent applications and publications recited herein are hereby 

incorporated by reference. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANTS: Human Genome Sciences, Inc. 

9410 Key West Avenue 
Rockville, Maryland 20850 
United States of America 

University of Wisconsin 
1300 University Avenue 
Madison, Wisconsin 53706 
United States of America 

APPLICANTS / INVENTORS : Dillon, Patrick J. 

Choi, Gil H. 
Welch, Rodney A. 

(ii) TITLE OF INVENTION: Nucleotide Sequence of Escherichia coli 

Pathogenicity Islands 

(iii) NUMBER OF SEQUENCES : 142 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Sterne, Kessler, Goldstein & Fox P.L.L.C. 

(B) STREET: 1100 New York Ave., N.W., Suite 600 

(C) CITY: Washington 

(D) STATE: DC 

(E) COUNTRY: USA 

(F) ZIP: 20005-3934 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3.50 inch, 1.4Mb storage 

(B) COMPUTER: HP Vectra 486/33 

(C) OPERATING SYSTEM: MSDOS version 6.2 

(D) SOFTWARE: ASCII Text 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: To be assigned 

(B) FILING DATE: Herewith 
<C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/031,626 AND US 60/061,953 

(B) FILING DATE: 22-NOV-1996 AND 14-OCT-1997 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Steffe, Eric K. 

(B) REGISTRATION NUMBER: 36,688 

(C) REFERENCE/ DOCKET NUMBER: 14 8 8.074 PC02 /EKS /CBM 

(vi) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (202) 371-2600 

(B) TELEFAX: (202) 371-2540 

(2) INFORMATION FOR SEQ I D NO : 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1178 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 



NSDOCID <WO. 9822575A2 I > 



WO 98/22575 



PCT/US97/21347 



-83- 

{[')) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



CNTANATTAG 


GCCTGCTNAA 


TGTATTTATA 


TCTAAAAAAA 


TTCGCATCCA 


AAAGGAATCC 


60 


AATCTGTACT 


GTTTTTTCTT 


GTGCTGACAT 


CTTCTTTTCC 


CTGGCTGGTA 


TGGCAAGTGA 


120 


CGGAGACAAG 


AGAAACGTTT 


TAAGCTCAGT 


TATCTCCGCC 


ATCACTTTCC 


ACGAATGACA 


180 


AGTAATTTTG 


CCTATTTTAA 


AACCATGCAA 


AAGGCAGGGT 


AAAAGGAGAA 


AATTCGATCG 


240 


AATCGATCGA 


CAAAATCGAT 


CATACATGAT 


GAAGATTTCT 


TATCGAATCC 


ATAAAAATAG 


300 


TGACAGCTAA 


CCGGCGTTGC 


AGGAACAGTC 


AG AAA T G G G C 


GTTTGGGAAA 


GAGCCATAGC 


360 


ATACGTCGTC 


GCTGACATAG 


AGGAACTGTG 


CTTTGTTGAT 


AAGATCCTTT 


ATACGGCAAC 


420 


CAATCCACTG 


GACAAAAGAT 


GAACTACGTA 


ATCACCGGGT 


TCTCACTGAC 


GAAATACAGA 


480 


AGTTAATGAC 


ACAACTGTGC 


CATGCACCTT 


GTACAACAGC 


GGTGGAAAGC 


TCTCAGAACA 


540 


ATGGAATTGC 


AGAAAGGTGT 


TAAAACGATG 


AAAGCCTTCA 


TACCCAAATC 


GAATGTAAGA 


600 


ACGGCAGTAA 


AGACTGAATT 


GCGTAACCTT 


GCAGTAGCTC 


GAGTATTACA 


CTGCATAGTG 


660 


TGCAGGGT I A 


It I LLLAi to 


l\\jP^t\}\i\ ini L 




AATAACGTCA 


CCTTAGATGT 


720 


AGCAGTTGCC 


AAATAGTGAC 


TCAAGGGCGG 


GCTTACCGCA 


TACACTGACA 


CTTAGCGGAT 


780 


CGACAGAATA 


TTATTAGCAG 


ATCATCACTG 


AACGCTACGT 


AATTATCGTA 


ATAAAGGCTT 


840 


TTTCTGGCTA 


CCAGGAAGAC 


CTGACATGGC 


TCTGCTCTGG 


AACCAGGCCG 


CAGGAAGCAT 


900 


CAATCTGGAG 


TTTATCAGCT 


ACTGGAATTC 


CGGTGTATTG 


GCAGCCCCTG 


ATAATCACCT 


960 


GACCCACGAA 


GAGCGCTCTG 


CTTTGCAGAA 


ACTCTGGGGC 


GGTTTGGAGA 


CAGGAGATGT 


1020 


AACGATTATA 


GGACGTTCTG 


ATGAAGTCCA 


TGATTTTACC 


TCCGCCTTAA 


TTAACTGTTT 


1080 


TCTTTCTGAA 


GAAGAAATTG 


TCTGGTGGCA 


ATCAGGTGGC 


ATTTTCCCGG 


ATCCTTGGCC 


1140 


CGCTAATATA 


TCCCGGCTGA 


ACTGACGATT 


AACGCGAT 






1178 



(2) INFORMATION FOR SEQ ID NO : 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 414 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 2: 

ATCCTATTCA TTTTGCCATG ACGGGCGAAC TCCAGATAAA GGTTTTGAAA GTAATGAGAA 60 

ATTATTAATT CATCCATGTT ACTGGCTTGG TTTGAATCTA AATCGTAATG CACTTGCTCC 120 

AGAGGAAGCA GAGGAGATAA ATGACGAATA TGATATTAAT ATTATTTCAG ATAATTCAGC 180 
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C AT TAG AAA T AAAACAATAG GTCAAATAAC TACTCATCTA GAT C AG AT AC CGATAGGAAA 
TGAAGGTGCC ACTGAATTTG AACAATGGTG TTTAGACGCA CTAAGAATAG TATTTGCATC 
CCACCTAACA GACATCAAGT CCCATCCAAA TGGTAACGCA GTTCAGAGAC GAGATATTAT 
AGGCACCAAT GGTGGCAAAT CTGAWTTTTG GRAACGAGTA TTGGAGGACT ATAA 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8752 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

TTGGGATCTG GTACANTCCA CCCAGCGGCA TTATCCNGAA GGCAATATTT TTAAGGATTA 60 

TTCGTCCACA AAATCAGTAC TGGAACCAGG CTCAAAAAAG GCTTTAACGT GACCTGCTNC 120 

CATCTACAGT AGATGTACAA CCTGTTAAGT TAATTGAAAA TGGTGTTAAT CCGGTTGTTT 180 

CTCCAGGGGT AGCAAGGGCC TTATTCGATA CAGTGGGTAA TGTTACTGTA AAATTACCAT 24 0 

CATTCCCTGT GGTCACATTG CAGGTCTGAG CTACAACTTT GCCTGTAAAC GTAATTGTTC 300 

CGTCATAGGC CATAGCTGAA CCAACAAACA CAGCAGAAAC AAATGTAGCC AATGCTATAA 360 

CTTTTATTTT CATAAAATGA ATTCCTGTTT AATTCCGGTA TTGATCATTT GTTCAGCAAT 4 20 

CATCCCCAAC AAAACAATCA TTTTCAAAAT GTTTTTACCG ATCGATAACC AGCACATGAT 4 80 

AGATTGCACC TATCATGATT GCTAAAACGA TCGGGAAAAG CGATCAAAAA CCATATTTAT 54 0 

TGTGTTGGTA ATGACAAAAG ATATGCTTTA CCCTGAAATG AGCGACCTAT TCATGAAAAT 600 

ATGTAGGTCT GTATTTGATT ACTATCATTG CTATATTTCC ACTATCCAAT TTATATTTCA 660 

TGATTAAAAT ATACCTTTTT ACACTATTAT TTATTTGTTG CAGCTTGCCT GGCTTTATCT 720 

TATTCCGACT ATTTTATGGT AGATACAGAA TACAATTAAT TAAACTTATT TAAAGATTTT 780 

ATAAATACCA TATTGGAGTT GACCGATAGA TACCTACTAA CAAGAGCAAT CACCACCACC 84 0 

CCATGAGGTG TTTAGGAATA CAATCAATAA ACAACATCCA TGCCCGGCGA CGTACATACC 900 

TGTTTGCTAT GATATCTGTT ACGCTACGCT TGCTAATTTA CTGAAACTCA GCATCTGTCG 960 

ACGGAGATTC GTCCGGGCCC TGATACAACA AGGGCAAGAA AACCACCCGA AATACAGATA 102 0 

TTCTTATAAA AATGGATCAT ATTTCCATGT GCAAGTTCAG CTGGCATCGT CCAGAATGCG 108 0 

TGTCCAAGAA ATGAAGCAAA CACGGTATAC AGGCACAGAA TAATGCTCAC TGGCCGGGTG 114 0 

AAAAAGCCRA AAACAATCAT TAATGCTCCA ACGATTTCGA CAAGGACCAC TATTGCTGCA 1200 

GTAATCGCCG GAAATATAAG CCCAAGAGAG GCCATTTTAT CGATAGTGCC AGTGAATGAT 12 60 

AGCAGCTTGG GAACGCCGGA TATCATATAA AGGCATGCCA GCATCAGACG GGCAAGGAGC 1320 
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AACAATGCCG ACGTGTAATT TCCCATATTA AAATACCTGA TTTTATCCAC TATCAATGCT 1380 

CAGTCTCCTT GTTTCTGATA AAGCCCTGAG CCAAATCCTT AAGTGTACGA GCACCACTCA 1440 

GTAACATTGC CGTCCTCAGC TCCGTCTTCA GGTGCTCAAT GACACTGGCA ACGCCCCCGA 1500 

CACCACCTGC TGCGATGCCA TAAAGAACAG GACGTCCGAC CGCAACAGCC GTTGGCCCAA 1560 

GAGAGATAGC CCTTACAACA TCAACCCCCC TGCGAATACC GCTGTCAAAA ATGACCGGAA 1620 

CTTTGTGCCC GACTCTTGCA GCAACTTCCT GCAACTGGCT GATGGCAGAA GGAACACCAT 1680 

CAATCTGGCG ACCACCATGA TTAGACACCT GGATGGCATC TGCTCCTGCA TCAATGGCGA 1740 

CCACTGCATC CTCACGTCTG AGGATGCCCT TGACAATGAC TGGCAGCCCG GTGATTTTTT 1800 

TTACAAACTC AATATCAGCC GGGGTCAGCT CAACTTTTTG GTTAAAAAAA TCACCTTTGC 18 60 

CACCGTAACG GGGGTCATGA TTACCGAACG TCGCTCCTGC AGGGAAAGGC GAGCTCATGC 192 0 

TGAGAAAAGG ATCACTTGTC CCGGGACCAA GCGCATCCGC TGTGATAATA ATGGCTGAAT 1980 

AGCCTGCCGC TTTTGCACGC TCCAGTAAAC TTCGGGTCAC ACCAGCATCC GCGTTAAAAT 204 0 

ACAGCTGGAA GCATTTAGGT CCTTT ACTGG CTTTTGCAAT ATCCTCCAGA GAGCGGTTGG 2100 

ATGCCCCTGA TGATTCATAA AGTGCCCCGG CCTTTTCTGC ACCCGCTGCA GCAATCACCT 2160 

CCCCTTCCGG ATGGACGAAC ATATGCGCGC CCATAGGTGC TATCAGCAGG GGATGTTCCA 2220 

GATGATGGCC CAAAAGGTCA GTCCGGATAT CAATGCTGTG GGCAGCAACT CCACTGAGTC 2280 

GGTGAGGTAA CAAAGGATAA TCACTGAANT GCCTGCGGTT CTCATGATAG GTCCACTCAT 234 0 

CTCCAGCACC ATGAGCAATA TATGCATACG CAGCTTCCGT CATCACATCT TTTGCTGAAG 24 00 

TCT YCAGTCT GTCCAGACTG ATGATATGAA GAGATTTGCT GGTCGATGTA TCAGCATGTC 24 60 

CAGACGTTTT ACTGATGATA TGTGCCGTTG AAGATGAGAT ATTTTTGGCA AGGGCCGGCG 2 52 0 

CAGTTGACAG CCTGCGGCAG ATATTCCTAA AACGGCATTC TGAATAAAAT TACGTCGGGA 2 58 0 

AAGAGGCATA ATAAGCTCCA TATATTATAA ATAAGCCAGG TCTCCCTGGC TTATAATGAT 2 64 0 

CATGCCACGC CCTGAAGCGG GTTGGTGTTG AAGGTATAAA GGAAAATTTT CCATTCACCA 2700 

TTAATTTTAC TGAGGACAAA AACTTCACGG TTCAGGTC/VA TAATGGTTTT CTGCTCTTTA 27 60 

AAGTTCGTTA CAACAGAACC CACATGGTGG TGAGTGCGGA CAACCGCGGT ATCTCCGTTG 2820 

ATCCAGATAG AGTCAAACGC AAAATCGGTC TCAAACTTTT CACGCTTGAA CAGATCATCG 2880 

TACTGCCCCT GGCGTTTTTC TGTATTGTCA GCCGTCAAGT TATCATTCCA CTGGGAATAA 2 94 0 

CTTTCATCAG CAAACAGGCC CAGGATGGTT TTTGTATCCC CGGCATTCAG TGCGTTCTGA 3000 

TACTTGATTA TCGTGTCATA CACGTTCTTC TGCTCAGTAG CAATCTTACT GTCTGTGGAG 30 60 

TATTTGAATG TACCGCCGGA TTGTTCAGGT GAGCTTTCCT TCTGTGCTGT CGACGATGAG 3120 

GCAGCCAGAG CATTAGAGCC GAAAAGAAGG GATGATGCCA TGACTGCTGT TGCTATAAAA 318 0 

TGTTTCATAT ATTCTCCATC AGTTCTTCTG GGGATCTGTG GGCAGCATAT AGCGCTCATA 32 4 0 
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CTATGCTGCT GTTTCAATAT TAGCGGCAGA CGTCAGCCTT ACCoCACTAC TTATTGGATA 3300 

AGAATATCAA AAGTGACCGT GAAGTCAATT TTATCACAAC ACAGAAGGCC ACTATTTATG 3 3 60 

CCCAGAAAAT ATGAATCGTC CTCATCATGC ACGAAAGACT C 3TAGTTGCA GCCCGGAAAA 34 2 0 

AACTGCCAGG ACACGACAGC AGATAGCCCG GGCAGCACTT GAGGAGTTCT CTGCACAAGG 34 80 

GTTCGCTCGC GCCACATNCA GCAATATCAG CAAGCGCGCA GGAGTAGCTA AAGGCACGGT 354 0 

ATATAACTAC TTCCCAACAA AGGAATTATT GTTTGAAGCG GTTCTGAAGG AGTTCATTGC 3600 

TACCGTCCGT ACTGAACTGG AATCTTCCCC CCGCCGCAAC GGGGNAAACC GTAAAAGCCT 3660 

ATCTGTTGAG AGTGATGTTA CCTGCCGTCA GGAAAATTGA CGACGCATCA ACAGGCAGAG 372 0 

CCAGAATAGC CCACCTGGTT ATGACAGAAG GGAGCCGGTT CCCGGTAATC GCTCAGGCTT 3780 

ATTTACGGGA AATACATCAG CCACTACAGC AAGCCATGAC CCAACTGATT CAGGAAGCAG 38 4 0 

CATCAGCCGG AGAGTTAAAA GCAGAGCAAC TGCTCTGCKT CCCCTGTTTA TTGCTGGCTC 3 900 

CAAACTGGTT TGGCATGGTG TATAACGAAT TCTGAACCCG GCAGCACCGG TCAGTACAGG 3 9 60 

CGATCTTTTT GAAGCCGGAA TTGGTGCTTT TTTCCGATAG ACACATAACT GTCAGTATTA 4 020 

TGACCATGCC GTCAGGAGGA GGTATACCAG TGATACCCTG CCATGACCCG GTAACGTCTC 4 080 

CTGGCTGCCT TAAACCTGAA AGACCTGGCC CCACCACACT GCCGGTTACG CATCAAGATG 414 0 

CAGCAACCCT TGCATAAGGC TGTTTTGTGC AGAGGGCTAC CGGAAAGATA ATAACGTCAC 4 200 

AGCCCGTATG CATCAGATAA AACAGTGTAT TTTATCTGTC AGCAGTCACT GGAGCGGATT 4 260 

GTGGGGCGAG ATTCAGGTGC TGATACTGTA ACGACTCTGC GCCGCTGCTG CGGTAAAAGC 4 32 0 

GGCTGCCACC AGGCACGGTT ATCAGAGGAG GATGACCGTG TCCGCCCCTG GTGGTGATGA 4 380 

ACTCTCCATG ACAATCAATA ATGCCGCCGG GTGGATGAAG CAGACAGGGA TGGCAAGTCC 4 44 0 

CACTATCCCG GATAAAATGG GCTCTGGGCG CTCAGAAGAC CTGTGTGTCA GGCAGGGGTG 4 500 

AGAACGGTGA TGTTTTTTGT TGTCTGAAAG TCCAGCTCCA GCATTGCCTG CCAGCCTCAA 4 5 60 

GACTTCCGCT TTCTGCCCTT TCCGGCATTT TCTTCCGTTA CCATCATTCT GTTAATTCAG 4 62 0 

AGGCGTAGTA GTAGTAAACG TAATACATAT CCGGGAGGAT GAAGTCATCT AATCCTGCTC 4 68 0 

CCCGAATATC ATACAGCCAT TCCTGAGTGT GACTGCACCA TTTCCAATTA TGCAGTCTGT 4 74 0 

CCTCATCACA AAAATGTTGC AAGCAGTGCG GAGTCACGTT CCGTATTCAT GCCCTCTGCC 4 8 00 

AGATATTGAG CGGGGGAGAA ATGTGTAAGC GTCAACAGAG CGCCGTATTG ACACTTATTT 4 8 60 

ATCGGTGAAA ACTACGTTCC ATGGCAGCAG TTCGTCAACA CGGTTGGAGG GCCATTCCGG 4 92 0 

CAGTACGCTC AGGATATGGC GCAGATACGC TTCTGGATCG ATACCGTTCA ACCGACAGCT 4 98 0 

CCCGATTAGT CCGTACAGCA GAGCTCCGCG CTCGCCTCCA TGATCGTTGC CGAAGAACAT 504 0 

GTAATTCTTT TTCCCGAGAC AGACGGCACG AAGCGCTCTT TCTGCTGTGT TATTGTCCGC 5100 

CTCCGCCAGA CCGTCATCAC TGTAATAACA GAGGGCGTCC CACTGATTCA GGACATAGCT 5160 
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GAACGCTTGR 


CCCAGTCTGG 


ATTTTTTCGA 


CAACGTGCCA 


TTCTTCTCCA 


CCATCCATTC 


321:0 


ATGCAGCGAC 


GTCAGTAACG 


CTTTGCTTCG 


CTGCTGCCTG 


GCTGCAAGAC 


GTTCAGACTC 


5280 


CGGTAAGCCC 


CGTATTTCAT 


CMTCAATGGC 


GTACAGTTCA 


CTGATGCGCT 


TCAGAGCTTC 


5340 


TTCTGCCGTC 


GTACTTTTGG 


TGCTGATGTA 


TACATCGTGG 


ATTTTTCGCC 


GGGCATGGGC 


5400 


CCAGCACGCA 


ACTTCTGTCA 


GTGCACCACC 


TTCACGTTCG 


GCACTGAACA 


GCCGATCGTA 


5460 


ACCGCTGAAT 


GCATCCGCCT 


GCAGGATACC 


CCGGAAGGGA 


CGAAGGTGTT 


GTACCGGATG 


5520 


TTTTCCCTGC 


CTGTGTGGTG 


AGTAGGCGAA 


CCAGACCSCC 


GGTGGCTCTG 


ATGAGCCCGC 


5580 


ATTCCGGTCA 


TCCCSGACAT 


ACGTCCAGAT 


GCGTCCTGTT 


TTTGCCTTTT 


TTCTGCCCGG 


5640 


TGCCAGCACT 


TTTACTGGTA 


TGTCGTCAGT 


GTGAACCTTG 


CGGGTGTTCA 


TCACGTAACG 


5700 


GTACAGGGCA 


TCATTCAGCG 


GAGTCATTAA 


CTGGCAGCAC 


GCGTCAACCG 


AGTTGGAGAG 


5760 


TAATGCACGG 


CTCAGTTCGG 


CACCCTGTCG 


GGCAAAGATT 


TCACTCTGAC 


GATACAGTGG 


5820 


CAGGTGTTCG 


CAGTATTTTC 


CCGTTAACAC 


GCGGGCAAGT 


AATCCGGAGC 


CCGCGATGCC 


5880 


GCGCTCTATC 


GGGCGGGAGG 


GCGCTGGCGC 


TTCAACTATA 


CAGTCACATT 


TTGTACAGGC 


5940 


TTTTTTTACC 


CGAACAGTGC 


GGATCACTTT 


CAGGGCGCTA 


CTCACCAGTT 


CCAGCTGCTC 


6000 


AGCACTAACT 


TCACCCAGAT 


AATCCAGCTC 


ACTGCCACAC 


TCCGGGCAAC 


AACTTTCTTC 


6060 


AGGCTCCAGG 


CGGTGTATTT 


CACGGGGAAG 


ATGTGCTGGT 


AACGGACGAC 


GATGACGTGA 


6120 


TTGTCGCAAC 


TGGCGGGGAA 


CTGCGGGTCA 


TCCTCACGCC 


CACTGTAACG 


ATCGCTTTCC 


6180 


TGTTCGCGTT 


GTTTCAGTTG 


GGCCTCAGCC 


TGTTCAACCT 


CACGCTGCAG 


TTTTTCAGAA 


6240 


CGGGTACCGA 


ACAGCATCCG 


GCGCAGTTTT 


TCTATCTGGG 


CCCTCAGATG 


TTCTATTTCC 


6300 


CGCTCGTCCT 


CTTCGATCTT 


TTCTTCGGCA 


CGTGCCARTG 


CAGAGCGCAG 


GAAGGCCTCC 


6360 


GTCTGTTCAA 


CCAGACTCAG 


TTGCTGATCT 


TTCTGACGGA 


GGGCTTCAGC 


CTGCTCAGAG 


64 20 


AGTAGCCTTT 


CCAGCTCAGT 


GATACGAA.TG 


AGGTATTTCC 


GACTCATGAC 


CGTTTTTATA 


6480 


ATCCGGCCAT 


GACATTTTTA 


CAACATTGTC 


AGTGCATTAA 


GGCGGGATGT 


TTTGGGTTGA 


6540 


CGCCAGTCCA 


GTTTATCGAG 


GAGCATTGGC 


AGCTGCGAGC 


GGGTAATGGA 


TACCTTACCG 


6600 


TCACGCACCG 


CAGNCCAGAT 


AAACTGGCCT 


TCCTGCAGAC 


GTTTGGTGAA 


CAGGGACAGA 


6660 


CCATCAGCAT 


GAGCCCACAG 


GATTTTAATC 


GTGTCACCCC 


GTCGGCCGCG 


AAAGATAAAC 


6720 








ACATGTTGTA 


CCTGTTCACC 


CAGACCGTTG 


6780 


AAGGATTTAC 


GGATATCAGT 


AACGCCGGCA 


ACCAGCCAGA 


TTCGAGTGTC 


TGATGGGAGC 


6840 


GAGATCATCG 


TCCTCTCCCG 


GTCAGTTCAC 


GGATCAACAC 


CGTGAGCAGC 


TCTGGTGAAG 


6900 


GATTTTCCAG 


CGTCATGTTA 


CCGTGGCGGA 


ACTCAACTTT 


ACAGGAACTG 


GCACTGACTG 


6960 


TGCTTTGTGA 


AGGAGTGGAT 


AAAAGGGGAG 


TAAGAGCCGC 


CATAGGCTCT 


TTCTGCTCAT 


7020 


CAGGCGTTAT 


GTCAAGAGGT 


AATAATTCAA 


CGCCAGCGCC 


AGAAGAGGTT 


GTTACCGGAA 


7080 
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GACGCCGCGA TATACGCCCT TCGTTCTGCC AGAGCCTGAG CCATTTGAAC AGGAGGTTAT 714 0 

CATTGATATC GTGTTCCCTG GCAATACGGG CAACAGAGGC TCCTGGTTGT GAAGCCAGTT 7200 

TAACCATTTG AAGTTTAAAC TCATTTGAAA ATGTTCTGCA GG ^TTCTGCG GATAATATTT 7260 

TCTGTTCCAT AACAGGTGTC CACTAGTTGA AAAAGTGGGC ACCTACGTTA CCAATACTGG 7 320 

CTTAATGGCT ACATACGGCG GTGAGTTTAC GGTTACAGAA ATGTAATGAA CAGGTGCTAC 7 38 0 

CATTAACTGA AGAGCATGGT GACGGATGAA GGAAAAAGCA GGAGTGTGTG GTGCCTCACA 74 4 0 

GATTTCCGAG ATCATAGGTG TCAAGGACGG ATGAAAAGCG GCTCTTCCGC AACTTGGGTG 7500 

GAAGAAAATG GATGAAACTT TCTGGTGTGA GAACGTTAAG GAAACAACAT GTTGGGTGGA 7 5 60 

GCGGACAATG CAAATGGTGA ATTACGGTGT TATATCACTG GCGCTGACAT TCCGGGCGTC 7 62 0 

TTCTCCGCCA CAACGCCATT TGCAGTGCAT CACAGGCCAG TTGTGCTGTC ATTCGCGGTG 7680 

ACATCGACCA GCCAATAACG GCGCGTGACC ACAGGTCGAT GACTACTGCG AGATACAACC 774 0 

AGCGGTCATC GGTACGCAAG TAMGTGATGT CACCCGCGCA MTTGTGGTTC GGAGCCTGGC 7800 

GCTGAAGTTC CTGCTCGAGC AGATTGTCCA ATACGGGCAG GCGATGTGCA CGGTAGCTGA 78 60 

CCGGGCTGAA CTTCCGGCTG CTTTCGCCCG CAGCCCCTGA CGACGCAGGC TGGCGGCAAT 7 920 

GGTTTTAATA TTGAACTCCG GCATTTCGTC AGCAAGGCGG GGAGCACCGT ATCGCTGCTT 7 980 

TGCCTCAATG AATGCCTTAT GGACAGCGGG ATCGCAGGTG AGCCGAAACT GTTGGCGCAG 8040 

GCTCATCTGG TGACGACGCC TGAGCCAGAC ATACCAGCCG CTGCGGGCAA CCCGAAGTAC 8100 

ACGACACATC GCTTTGATGC TGAAGTCTGC CCGATGATTT TCGATGAAGA CATACTTCAT 8160 

TTCAGGCGCT TCGCGAAGTA TGTCGGGGCC TTTTGGAGGA TGGCCAGTTC CTCAGCCTGC 8220 

TCCGGCAGTT GTCGTTTAAG GCGGACATTT TCAGCGGCCA GTTCGGTTTC GCGCTCTGAC 8280 

GAACTCATTT GTTGCTGCTG TTTACTGCGC CAGGCATAAA GCTGAGATTC ATACAGGCTG 8 34 0 

AGTTCACGGG CTGCGGCGGC CACACGGATG CGTTCAGCGA GTTTCAGGGC TTCGTTACGA 8 4 00 

AATTCAGGCG TATGTTGTTT ACGGGGCTTC TTGCTGATTG ATACTGGTTT TGTCATGAGT 8 4 60 

CACCTCTGGT TGAGAGTTTA CTCACTTAGT CCTGTGTCCA CTATTGGTGG GTAAGATCAC 8520 

TCAGCAACGT ATCAAAAGTC TGTAAAATGA TGGGCGTTTC GCGTGATACA TTTTATCGTT 8 5 80 

ACCGCGAACT GGTCGATGAA GGCGGTGTGG ATGCGCTGAT TAATCGTAGT GCCGCGCTCC 8 64 0 

TAACCTTAAG AACGTACCGA TGAGGCAACT GAACAGGCTG TTGTTGATTA CGCCGTCGCT 87 00 

TTCCCGGCAC ACGGTCAGCA CCGGACCAGC AAACAAGCTG CGTAAACAGG GC 87 52 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2417 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

TGGTCAAAGA TGCAACTGCA TTTCGTCGCG GCTTTGCGGC AAATACTTAC ATCGCAGAAA 60 

TACTGTGCGG AAATCTGCAT CCATTTCCAC TTGCTGTATG GCATAACTTT TCAGGCGGTC 120 

CGGATACTGC CGAAGATTAT TATGCCACAT ACCACCCGTT ATGGGGGCAA TATCCGGAAG 180 

CATTGCTGTT TGTAAACTGG CTCTATAATC ATTCCTCTGT GCTGCATGAA CGGGCAGAAA 24 0 

TCATTAAATG CGCCGAAATG CTGATGCAGG AAGATGATTT CGAAATATGC GAAAGTATTT 300 

TAAGACAGCA GGAGAAGTTG CGTGAAAGAA TTGATGAGAC GCTTTCTGAG AAAATTGTAC 3 60 

AGAAATGCAG AAATATGAAT GGTGAATATC TCTGGCCCTG GATATTGCCG TTTTCAGCGG 4 20 

CAGGCATGAA ACATACTGGC ATACAGTATC AG TAG AT ATT GCATTAGTGT ATCCTGCACA 4 80 

CAAGTAATAA TTTATCCACC AATAATAACA CTGTTAATGT CCCCTTCCCC TGGTTGTCAG 54 0 

CCAGGGGTTA TCTTCTGAAT ATTTCTTTTG AAAAGGATAA CACAATAAAT TATTTTTATG 600 

AATTATCCCA TGGACTCATT AACACCCTTT CATAATGTTT TATTGTCAAA CACGTTATGG 660 

CTGACATCAA AAAAAACCGG ATTTCCTCTG CCAGCGGGTA ATCACCTCCC CGGTGTTTTC 720 

GGTTGGTCTG GTTACTCCTG TCTGGTTATT AGCAAGATAA TTGCTATAAA CAGTGGAAAA 780 

CTCATCGTAC ATAATCTGGT GATGAACATT ACGCTTATTT TCCCTTGACC GGAAGAATCA 84 0 

GAGGCTGCGG TTTCAGACTG TCTGCCGGTA CATTCCTCTC TCCGTTAAAA ACCATAATGG 900 

GTTCATTATC TTCGTCTGTC AGTAGATTGA ATGGCGGTAT ATTTTCAGTA CGAATGCCGG 960 

TCAGCCACTG AAAAATACCT GCGAAATGAC GGGCACTGAT TTTTCTGCTG ACGGACTGAT 1020 

GAGACGTGAT GTCACTGGCG GTAATAATCA GGGGAACGCT GTAGCCTCCC TGCACATGAC 108 0 

CATCATGATG AACAGGATTA GCACTGTCGC TGACCGACAG CCCATGGTCA GAAAAGTAAA 114 0 

GCATGACGAA ATGACGGGAA TGCCGGCGAN GGATACCATC AAGCTGACCG AGAAAGTTAT 1200 

CCAGTTTACT GATGCTGGCG AGGTAACAGG CAACCTTTCG GGGATACTGC TCCAGGTAAT 12 60 

GATTCGGCCA GGAGTGAAGC CGGTCACACG GGTTCGGATG AGACCCCATC ATGTGCAGGA 1320 

ATATCACCTT CGGAGAGGAT TTATCCGCCA GCGCACGTTC TGTTTCCTGT AACAACAACA 138 0 

TGTCATCCGT TTTACGGGAA GCGAATGCSC TTTCTTGAGG AAAACGGTAT GCTCCGCATC 14 4 0 

AGAAGCAATA ACAGAGATGC GTGTGTCATG CTCTCCCAGT TTTCCCTGAT TGGATATCCA 1500 

CCATGTGCTG TATCCTGCTT TTGCTGCCAG CGCCACCACG TTGTTGCCGG AATCAGGGTT 1560 

CTGCTCATAG TCATAAATCA GTGTCCSGCT CAGGGAAGGT ACGGTACTGG CTGCTGCCGA 1620 

TGTATAGCCG TCAATAAATA AACCGGGAGC TGTCATTCCA GCCACGGCGT GGTTGGCCAC 1680 

GGGATAACCA TATACCGACA TATAATCCCT GCGCACACTC TCACCAGTGA CAATCACAAT 17 4 0 

CGTGTCATAT AACGGTGTTC CCCGGCCAGG ATTTTCCCAG TTGTCAGCCC CGTGCTGACT 1800 
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CAGTTGTTTA TAATGCTGCA TTTCACGCAA TGTGTCAGTT GTCCCCACAA CAGTTCCTTT 18 60 

AACCATCCGC AACGGCCAGC TGTTTACTGA GCATAATACG AACAGCAGCA GTGCCAGCCA 192 0 

GTTACGGTGA CCACGGCGGT GTGTTCGCCA GAAAATCACC ATGAATACCT GAATCGCGGC 1980 

ACTGACCAGA AAATGATAAA CAGGAATCAT CCCGGTAAAC TCCGCTGCCT CATCAGTTGT 204 0 

GGTCTGCAGC AACGCGACAA TAAAACTGTT GTTGATTTTA CCGTACGTCA TACCGGCAGG 2100 

CGCATACAGT GCACAACAGA ACAGAAATAA CAGCGCTGTA ATGGATGTGA GGGTATTTCT 2160 

GTGTGCAAGG AGCAGAAGGA GAAACAGAAG CAGCACATTT CCTGTTGCAT TCCTCTCAGT 2220 

GTATCCGCAT GCAATTGTGG TTATTGCAGA CACAACAAAA AAGAATAAAA ACAATAAAAT 22 80 

CCGGGGGGGG TTGCCCGGAC AAAACAGTTT TCTGATATTC ATCGGAGTAT ATCGACAACA 234 0 

TTATTATGAA GAGAACAGGA TAATAAAAAT CAGAAATTAT TGTAAAACAG ATAAAAGCAN 2 4 00 

CNATGCAGTA ATAGACT 24 17 
(2) INFORMATION FOR SEQ I D NO : 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

AGACAAAAAC CAGTTACGGT TATCACGTAC CAGCCCCCGT ATTTCCAATT TATAATCCTG 60 

GCCATCAATT ACTGGGATCT CTTCTTCTCC ATAGAAGGCA TTAAAAGGGA ATGGAGTGGT 120 

AATGTCCTCT GGAAGATATT CTGGTGCCAC ACTGTTTTTG CTGAACAGAA AACTTTGAAT 180 

CCGGTCATTA AATCTGGATA TACGGAACAA TGCTTTTTCA ATATCATCAT TATTGCTTAT 24 0 

ATCACAGCCA GTCAGCATCA TAATTCCCCC AAGCGTCAGT CCCTGTTGGA GTAAACGACG 300 

TCTGTCCGGC GCAAGGATTT TTTCTGCATC TTTCACCACG TAATGGGCAT CACTGTCAGA 3 60 

CAAAAAACGT TTTTTCTTCA TTAGTGACCC CGTATCATAG ATAACAATGC ACGCGGAACC 4 20 

AATAACACCA TAACCAGGTG AATAATAATG AACAGTACCA TAATGTTCAT GCACAGAAAG 4 80 

TGGATATAAC GCGCTGTATC ATAACCACCG RATAGTATAG TCAGAAGGGA AAACTGAACG 54 0 

GGTTTCCATA AAACCAGACC AGACAATAGA AGAGCAGCGC CATCTAAAAT AATCAGAATA 600 

TAGGCGACTT TTTGCACCAT ATTGTATTCC TGCATATTCG TATGATGCAG CTTTCCATAC 6 60 

AGTGCCTGCG TAAGGGATTT TTTCAGTGAG GTCCATGACA GCGGGAAAAA CTTGCTCCGG 7 20 

AAACGTCCGC TACAAATTCC CAGAGTAAGA TAGATCGTGG CATTAATCAG CAGAATCCAC 7 80 

ATCAGGGCGA AGTGCCACAG TAACGCACCG CCAAGCCAGC CACCGAGAGT TAATGCTGCC 84 0 

GGATAGTTAA AAGAAAACAA AGGAGAAGCA TTATAAATGC GCCATCCACT ACATATCATG 900 
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CCTGCGACAG 


TAACAGCATT 


AATCGAGTGG 


CAACAGCGTA 


ACCACAGAGG 


RTGTATTTGT 


960 


TTTAACGGTA 


ATGGCTGCAT 


TATGTGATCT 


CTGTCTGTAA 


ACTAAGTATA 


TTATGGAAAG 


1020 


GAATGTTCAT 


CACATCCTCA 


CAAGAGTTTA 


AAAAAAATGT 


GACAANTCAT 


CGTCAAATGC 


1030 


TGGGGTAAAA 


TTCAGATAAA 


GAATATGTGG 


ATAACTTTTG 


ATGAATAACG 


TAAAAAAAAT 


1140 


ACTGCTGATG 


GAAGATGATT 


ATGATATTGC 


AGCTCTGTTG 


CGGCTTAATC 


TGCAGGATGA 


1200 


AGGGTATCAG 


ATAGTTCATG 


AAGCGGATGG 


CGCCAGAGCT 


CGTTTATTAC 


TAGACAAGCA 


1260 


GACCTGGGAT 


GCCGTAATAC 


TTGATCTTAT 


GCTGCCTAAT 


GTTAATGGGC 


TGGAGATTTG 


1320 


CCGTTATATC 


CGTCAGATGA 


CCCGTTATCT 


GCCTGTGATT 


ATCATCAGTG 


CCCGTACCAG 


1380 


CGAAACCCAC 


CGCGTCCTGG 


GACTGGAAAT 


GGGGGCTGAT 


GACTATCTAC 


CGAAACCCTT 


1440 


TTCCATTCCT 


GAGCTGATTG 


NCCCGCATCA 


AAGCGTTGTT 


TCGTCGTCAG 


GAAGCCATGG 


1500 


GGCAAAATAT 


TCTCGTGGCA 


GGTGGACTGA 


TTTGCTGTGA 


CGGTCTGTGC 


ATCAATCCAT 


1560 


TTTGACGTGA 


AGTTGATTTG 


CATAATAAAC 


AGGTTGATCT 


TACCCCACGC 


GAGTTTGATC 


1620 


TGCTGCTCTG 


GTTTGCACGT 


CATCCTGGCG 


AAGTTTTTTC 


CCGTCTTTCA 


CTGCTGGATA 


1680 


ATGTCTGGGG 


GTATCAGCAT 


GAAGGATATG 


AGCATACAGT 


CAACACGCAT 


ATCAACCGTC 


1740 


TTCGTGCCAA 


AATTGAACAG 


GATGCAGCAG 


AGCCAAAGAT 


GATCCAGACC 


GTCTGGGGAA 


1800 


AAGGGTATAG 


GTTTTCAGTT 


GAGAATGCAG 


GAATGCGATA 


AATGAATTGT 


AGCCTGACAT 


1860 


TAAGCCAGAG 


GTTAAGCCTA 


GTATTTACAG 


TCGTTTTGCT 


GTTTTGCGCC 


GTGGACATGT 


1920 


GGCGTTCATA 


TTTACAGCAG 


TAATCTGTAT 


GGCAATGCAA 


TGGTACAGCG 


TTTATCTGCA 


1980 


GGCTGGCGCA 


ACAGATTGTC 


ATCAGGGAGT 


CTCTGCTGGA 


TAATCGTGGG 


CAGGTGAATG 


2040 


ACCGGACATT 


AAAGAGTCTG 


TTTGAGCGTC 


TGATGACGCT 


TAATCCCAGT 


GTGGAGCTGT 


2100 


ATATTGTCTC 


GCCGGAAGGT 


CGGGTGCTTG 


TGGAGGCCGC 


CCCTCCAGGT 


CATATCAAAC 


2160 


GTCGGTATAT 


CAATATAGCG 


CCCTTGAAAA 


AATTTCTCTC 


CGGTGCTGTC 


TGGCCCGTAT 


2220 


ATGGTGATGA 


TCGCCGAAGT 


GTAAATAAGA 


AAAAAGTTTT 


CAGTACCGCA 


CCGCTTTACC 


2280 


TGAGGGATGA 


TCTGAAAGGA 


TATCTGTATA 


TTATTTTACA 


GGGAGAGGAA 


CTTAATGCTC 


2340 


TTACTGATGC 


AGCCTGGACA 


AAGGCACTAT 


GGAATGCACT 


GTACTGGTCG 


CTGTTTCTGG 


2400 


TAGTGATATG 


TGGTCTGCTG 


TCGGGTATGC 


TGGTCTGGTA 


CTGGGTAACC 


CGTCCCATAC 


2460 


AGCAACTAAC 


TGAAAATGTC 


AGCGGGATAG 


AGCAGGACAG 


TATTAGTGCC 


ATTAAACAAC 


2520 


TGGCAATTCA 


GCGCCCTGCC 


ACCCCCCCTA 


GCAACGAGGT 


CGAGATATTA 


CACAATGCGT 


2580 


TGATTGAACT 


GGCCCGTAAA 


ATATCCTGTC 


AGTGGGATGA 


ACTTTCAGAA 


AGTGATCAAC 


2 64 0 


AGCGCCGTGA 


ATTTATTGCC 


AATATCTCCC 


ATGATTTACG 


GACGCCATTA 


AGATCACTTC 


2700 


TGGGATATCT 


GGAAACCCTG 


TCAATGAAGT 


CGGATTCGCT 


ATGATCAGAG 


GACTGTCATA 


2760 


AATATCTGAC 


AACAGCTCTC 


CGGCAGGGAC 


ACAAGGTGAG 


GGATCTGTCC 


TGTCAGCTTT 


2820 
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TTGAGCTGGC ACGTCTTGAG CATGGTGCTA TAAAACCTCA ACTGGAGCAA TTTTCTGTCT 2880 

GTGAACTTAT TCAGGATGTA GCTCAAAAAT TTGAGCTCAG CAT AG AAA CC CGTCGATTGC 2 94 0 

AACTAAGAAT TATGATGTCA CATTCCCTGC CTCTTATCAG GGCAGATATT TCAATGATAG 3000 

AGCGTGTGAT AACAAATTTA CTGGATAATG CTGTACGCCA CACACCTCCG GAAGGCTCGA 3060 

TCAGGCTGAA AGTCTGGCAG GAAGATAATC GGTTGCACGT CGAAGTGGCT GACAGCGGCC 3120 

CTGGACTAAC TGAAGATATG CGAACTCATC TTTTCCGGCG GGCATCAGTG TTATGTCATG 3180 

AACCGTCAGA AGAGCCCCGG GGAGGACTGG GATTGCTGAT TGTACGCAGG ATGCTGGTAC 32 4 0 

TACACGGTGG TGATATCAGG TTGACTGATT CAACGACTGG AGCCTGCTTT CGTTTTTTTC 3300 

TTCCATTATA ACATCAGGCG GCATATTTTG GGGTGGTTAT GTGTATCTGC CTTTGTAAAA 3360 

GGGATACAAG TTCTGTAGTG GAGCACAAAA TCAGGACACC GGAATAACCT GTTTCCACTT 34 20 

TTCTTCATGT AAGCAAGGCG GTAAACCATC GTTGTTCGTG TGAGGTCGAT AAACGTTGTA 34 80 

ATAACCATTA ATCCACTGGT TTATATCACG TACCGCATGG ATAAAATCAC CATAACCACC 35 4 0 

TTTCGGAAGC CATTCATTTT TAAGGCTGCG AAAGACTCTT TCCATCGGCG AATTATCCAG 3 600 

GCCATTCCCT CTGCAACTCA TACTTTGCAT TACCCCATAA CGCCAGAGTA ACTTTCTGTA 3660 

TTTATTGCTT TTATACTGAA CACCTTGATC TGAATGAAAC AGCAGGCGGC CATCACGCGG 3720 

TCGAGTTTCC AGTCCGTTAC GCAAAGCCCT ACACACCAAC TCAGCATCAG CGGTTAATGA 37 8 0 

GAGGGCTGAA CCGATAATCC GCCGTGAATA TAAATCAACA ACGAGCGCGA GCTAACACCA 384 0 

TTTGTCCTGC AGGCGAATAA AACTGATGTC GCGCACCAGA CGCAGTTTGG TGCGGCGGGG 3 900 

TGAAATTGCC GGTTCAGTAA ATTTGGCAAT GGCGGACTTT TGTCTTCGTT TACCCGGTTG 3 960 

TGATGTTTAA CCGGCTGTCG ACTTGTCAGC CCTCATTCCC GCATCAGTCG TCATGCCAGC 4 02 0 

CACCGGCCTG CATCAACGCC ACTCTGGCGC AACATCTGAC TGATTGCCCG GCTACCCGGC 4 080 

TGCGCCACGA CTGAGAGCAT GGAAAGCCCT CACCCGGCTT CGTAATTCAA TTCTTTGCAC 4 14 0 

ATTAACAGGA CGCTTCACCT GCGCGTAATA AACGCTACGG TTAATACCGA ATAAATGACA 4 20 0 

AATAACCCAC ACTGGCCACT TTGCTTTCAG CTGTGTGATT AGCGCGACAG CTTCCCGGGG 4 260 

ATTTCGCTCA TCAGCACGGC AGCCTGCTTT AGTATTTCTT TTTCCATCTC AACGCGCTTT 4 32 0 

ATCTGCGCTT TAAGCTGCTG AATTTCGCGT TGTTCAGGGG TAATAGCATT ACCAGCTGGC 4 38 0 

TCAATACCCT GAAGTTCCTG CTTATACAAC CGTATCCATT TACGCAAATG GTCAGGGTTG 444 0 

AGCTCGAGTG CCTGCGCGAC TTCTCTGACA TCACGCTGGT ATTTAACCAC CACCTGCTCG 4 500 

AAAGCTTCAA GCTTGAACTC CGGGGAAAAG GTACGTTTAG TCCGACGAGT TTTGATCATG 4 5 60 

CATCACCTCA TTTTCACTGT TTTAACATTA ACAGGATTTC GAGGTGTCCT GAATTACCGA 4 62 0 

TCCACTACAA AGTACGACAG GTACTGTGGA GGTACTCCCG TAAAGACGGC CATCAAGCTC 4 68 0 

CCGCTCCGAC ATACCTGCGG GCAGAGGCCA TGAAAAGCCA GCTTTGCGAA AGCGCACGAA 47 4 0 

INSDOCID <WO 9822575A2J > 



WO 98/22575 PCT/US97/21347 

-93- 



CATACCACAA 


GCTGTTGATT 


TTGGTACGCC 


CAGGCGACGC 


CCGACCACAA 


CGTGGGGTAA 


4800 


ATGTTCTTCA 


AAGTGAAGAC 


GTAAAGCTTC 


AGTGATCCAA 


GTCCGGTGTT 


TCATACGATA 


4860 


GTGTCCATTA 


AAAATGATGG 


ACATTATTTT 


TGTAAAACCG 


GAGGAAACAG 


ACCAGACGGT 


4 92 0 


TTAAATGAGC 


CGGTTACATG 


TAATCCATAC 


TCATCCAAGG 


TTTAATTCTG 


ACACAATAAG 


4980 


AAAATATGGA 


AAGTCTCGCT 


CTAGAGATGG 


GGAGAGGGAT 


ATTGAAGTGT 


ATGATATTCC 


5040 


AAGAACTGCC 


GGAGATATCC 


TCGTAAATGG 


ATTTTCCAGT 


GCAAACTGAT 


AACAAATTCG 


5100 


AAGTCATTAT 


CTGCAACAAG 


ATTGATTGAT 


GTAGGGGATA 


TGTTAGAGCA 


TTATAATGCT 


5160 


CAAGGATTTG 


GCGTGATGAC 


ATCTGCGCCA 


ATTGATGCGA 


CACTATATGA 


TAAACTGGAT 


5220 


GCTATTTGCA 


GTAAGTGTAA 


AATAGAACAA 


ATAAATTTTT 


GAG TAT TAG A 


GTCAGAACGC 


5280 


GCACTATATT 


ATGACGATAT 


ATTAAGATGC 


CGTTACTTTG 


GTAAATAMCA 


TAAAATTAAT 


5340 


CAATATGGTA 


ATATATCAGT 


TGTAATTGAT 


CGAAACAAAG 


CACATAAATG 


CCATCTTATA 


54 00 


AAGATGGTGT 


TTKTTAAGCA 


TATAAAATAT 


ATTTTCTATA 


AGATATAGGG 


CAAAGTAAAT 


5460 


TTCTTGACTT 


CTATGATGGA 


CTAACTAGAT 


ATACATGCCG 


CCAGTTTTTA 


TAAAACGACG 


5520 


GCATATATAA 


TCATTTATAT 


ATCTTTTGAT 


TTTATTCGTA 


ACCACTCATG 


TTGATCTAAA 


5580 


CCTATTCTTG 


ACAGATTAGC 


AAC AAT AT C A 


GTTGTTATTT 


TTTGCGCGTA 


CGTTGTTTTT 


5640 


ATTTCCCCGA 


TCCATTTCAA 


TACTTTTGGA 


GTAGATATTT 


TTTCAACGAG 


TAAAGGAACG 


5700 


AAT GAG AT AT 


AGTCAGTATT 


AACTAGATTG 


TTCTTTTTCC 


CTATGATGAC 


ACCGTTTCCA 


5760 


TTTTCGACTC 


CAAATGAAAA 


TGAAATAATA 


TTAGAAGCTT 


TTGCCGGCAT 


TTTAATTTTA 


5820 


TAAAAACCGC 


CATATTCATC 


TTCGATTAAC 


AAATTGTAAT 


TATTATCGTC 


CAGTGTTCCC 


58S0 


CTGAGGAATA 


AAAAATCGGC 


TTTTTCATGC 


AATCTGACGC 


TATCACATAA 


TGGTTGTATG 


5940 


CAT AH AT AH A 


C AAAATTATA 


TGCATCTAAA 


AGTAAAGTTC 


CTTGTTTTAA 


GGACACATTA 


6000 


TCTATATGAG 


AATGATATCT 


TAAACTCCTG 


CGCGTGATTT 


CCAGAGAGCA 


TAATTGCATT 


6060 


AACTTTTTAT 


CTTCTTCACC 


ATCTTGGCTT 


AAGTATTCCT 


TTTTACCTAA 


AGATGCGTGT 


6120 


TCAATAGCGT 


GTTGAATTTC 


TTCTAAAGAA 


TCAGCAGAGA 


GTATATTCCT 


TAGATGTTCT 


6180 


ACTGATAAGT 


CTTTTTGTTT 


TTTTCCAGTT 


AATAGAAAAT 


TCTTACAAGC 


ATTTTTTGCA 


6240 


TAGTGAAAAA 


TAGGCCAATG 


GGATAAGGAG 


TTTTTGCTTA 


GAGATTTCTG 


GGGA 


6294 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4519 base pairs 
{ B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



SEQUENCE DESCRIPTION: SEQ I D NO : 6: 
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TATTCCTTTC TCTCCCATGA TAGGGCGAAA GGCTTTATTA CTATCCACTG CTGGTTTATT 60 

AATTGCATCA TCGTCGATTA ATTTGCTGGA GGTTCCAATA GTCAACCACC TCTCTTCAAA 120 

TTCATCGGTT GTCATACCTA ATCCATCATC T TTCAAGATA AGAAGATTTT CTTTCCTAAA 180 

AAAATCAACT TCGACATTAT CAGCATAGGC ATCATGAGCA TTTTTAAATA ACTCACTCAA 24 0 

GGCAGTAGGT ATACCTGCAA TTTGTTGTCT GCCAAGCATG TCCAAAGCTC GAGCCTTTGT 300 

TCTTATTTTA GCCATATATC TATGAATCCT TATTAGTACA ATTTTCTATG AGATGTAGCC 3 60 

CAAATAGTCT AGCGAGTTCG CAAGGTACAG CATTGCCGAT TTGCTTTGCC ATTGAATTCA 4 20 

GCGAACCTTT AAAAACATAG CTTAAAGGAA ATGTTTGTAA TCTTGATGCT TCTCTTATGC 4 80 

TAATTGCTCT ATGTTGAGTG GGGTCAGGAT GCCCAAAACG ACCATTGGAG TAACTATTAC 54 0 

ATTTCGTCGT AAGTGTAGGC GCAGGCTTAT CCCAACTCAT TCTTCCATAA GTATCTGTGT 600 

GGCCATCATA ATTTTTATGG CATTTATTAA CTAACTCTTC TGGCCAATTT CTTCTATCCC 660 

CTCCTTCTGG AGTGTGCATA AKTCTTTTTA GGTTAAGAGG GCTCAGTGTT CCAGCCCTAT 720 

GTAAAGGATC TTTGGGGTCG GTTTCTCCTG AACATAACTT TGTGAAGTCC TGGATATAAT 780 

CTCGTACAGT TTTGAATGGG ATTTTATTTT TACCATGGGT TATCTCTGGT AGGGTAACTT 84 0 

TACCTACTCG ACTAGCTAAG AGCACGAGTC TTTTTCTTCT TTGGGGAATC CCATAGTTCT 900 

CAGCATTGGC TATAAAAGAT ATATAGTTAT ACTCTAACTC TTTAAGTAGC TTAATAAACT 960 

CCTGAAATGG GCCTTCTTTT TCTTCATCAA TTTTTTGCAT TCCAGGAACA TTTTCAAGCA 1020 

TAATATATTC AGGAAGAAGT TCTCTAATAA AACGATGAGT TTCATTTAGT AGATTTCTCC 1080 

TTGAGTCGTC ACTAGTTTTA TTTTTATTCT GTTGCGAAAA TGGTTGACAT GGTGCACATG 114 0 

CACTCAGTAA CAAAGGCCGT TTAGCTTTAA TATCAATGAT GTCGGAGATA TCTTGAGGTT 1200 

CGATTTTCCT AATATCATCT TGGATGAATT TTGCATCAGG GAAATTAGCT TTAAATGTTT 12 60 

CTGATGCTTG TTGGTCAATA TCTAATCCAA GCTCGATATC AAAGCCAGCC TGACGTAGCC 1320 

CTTCACTGGC TCCACCACAG CCACAAAAAA AATCTATAAC TATCAATTTG ATACCTTCTT 138 0 

TGAACTAAAT AAAACAACTC GAATAAGTTG ATATTTTAAA TAAAAATAAT TGGTATGGAT 14 4 0 

ATGAACTTTG GTCACGCTAC CGCCCTGAGK TCATGGCCAT CCCCAGACCT TTTAAAGGGA 1500 

TTATGAACAA CACCCAGCCG ACGTTCAACG GTGTTACCCA TACATATCAC AAAGTTAGTT 1560 

AATTGGTTGG TCGTAAATTG ACCTAAAATG GATTGAGGGC AATGCAAAAA TCATTGGGAA 1620 

ATCCAGGCGA CACAGATGTT CGGAAGAGAC TGAATGTTAA AAATATAGAA TGTATATTCT 1680 

CAAAAAAGAG ATATTTCATT ACATTTTATA TGTGTATAGG AAAGTGAGAT TGGCGAATCA 1740 

CCTCCCAATC ATCCCGCCAG CGCTCCATTC AGCGCCACGC CAACCCTCAC TCCAGCCCAC 1800 

GTCATCGCCC CCAGCCAGAA TGTCGGCAAC ACCAGAAACA TCAACCTCAT CACCAGATTG 18 60 

ATAATCACGT CATCCTGCGT ATTCTGGATC CCGGCTAAAT TCCAGCTACT GTGGGTATCG 1920 
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CTGTTGTAGA GCACATCCAG CAGCCAGCTA TCAAGCCACC GTGCCAGTTC CCACCAAAAG 198 0 

GTGAGGAAAA ATAGTGCAAA CTGCACAAAC GTCAGCGTCA TCACTACTTT CACATCCCAC 204 0 

GCCGAACAGA GCGTTATCAG CGGAATACAG ATCACCAGCG CTATTTGCAG TGCGCCTGTA 2100 

CCATCGGTAC; TGCCTAACGC ACGCTGTCGA ATGCCGTACA TGCCGCTATG CTGCCGAGGA 2160 

TATTTCTAGC GGCGGATGCC AACCG jGTGG CGGCATTGGC GACGGTGCCA TCAACGTTAC 2220 

CGCCATAGCT TGGATAAACG CGCCCATTCT GCGATACCTG CATATTTCGT TCACTGACCC 2280 

GCGAGCGCAG CACGGCCTCT TCATACACTA CCTGCGACTG GTCGATTTTT TTAAACGCCG 234 0 

TCCAGATATC TAGGGCAGGA AGTTGCAGTA GACGGGCTTT CAGCCCAAGC GGTGTCGTCG 2400 

GCCCACCGCT GTTTACAAGT GGGATAGCCG CCCGCGCCCG TATCGGCCAG CCCGGCATCG 24 60 

CGCGATGCAC TGTACGGCCA AGCACTGTGT GGTGAAAGCG CATGGTCGGA AAAGGCCTGT 2520 

TCACCTAACC AAGCACATCC CACCATCACA AGAATCGCCA GAAAACCAAA CTCAGTCAGA 2580 

ATAACTCTTC CTGATTCAGG CTTTGCTCCT GCATTATGGC TACCACTATT GTTTGCCTGC 264 0 

ACGTATCATC TGATAACGGT TAATTAACTG ATTTAGCGCC ATTTCAGCCT GTTTTTGCTG 27 00 

CTGTTCACTG CCATTCTGGT TACGGACTTC ACCGTAGCGA CGTAACTGCT CTTCCGCCGG 27 60 

GATATGCCGG TTAAAAGCCT GCATGATGCC AAACACCTCC GTTTTCAGTT CACTGACCGT 2820 

CATGTATTTT GGCCGCTGTT CATCCTGACG GTTCAGGCGC TCAGCCAACT GCTGTAAGCG 28 8 0 

GATCATGCCT TCGTTCCAGC CCGTCATCGC CTCTTCCGGG AGCGCACGAC TCCTTACACT 2 94 0 

CTTCTGCCAG TTATCCACCA TTTCCTGAAC ACGGGGATTG CCGGGGACAA GAACCCTCAG 3000 

TTGCTGCAGG AGCTGCGCAC TGGAGCGCAG GTTGTATGCT GGAGGTAATT CTGCCAGTCG 3060 

CGTTATCTGC TGACCGGAAA GGGTTATCCA GTGCACTCAG GGCAGATACC GGATTCAGGT 3120 

TAATTTTTTC AAACAGGGAA GCATATACGC TGTCGCCGGT ATCGGTTTCA GATACCACAC 318 0 

TCTCTGCGAC GTTCTTTTCT TTCTGTACAG ACATCAGCAT TTTCTGTAAG CGTACAGCGA 32 4 0 

GGGCCGTATT GACGGGGATG TGTTATTCAG CTGGCAGTGC TATGCGCCAC GGAAGCAGTT 3300 

CGCTGACCCG GTTGACCGGC CAGTCTGCTA TGACGGCAAG CACATGGCGA AGGTAGCTTT 3360 

CTGGATCCAC GTCATTCAGT TTGCACGTCC CGATCAGGCT GTACAGTAGC GCTCCCCGCT 3420 

CACCACCATG GTCAGAGCCG AAGAACAGGA AGTTTTTACG ACCCAGACTG ACCGCCCGCA 34 80 

GGNCATNTTT CAGCGATGTT GTTGTCGATT TCCAGCCAGC CATCGTTCGC ATAGTACGTC 354 0 

ATGCCGGCCA CTGGTTAAGT GCGTACGCGA AGGCCTTCGC CACCATCAGG CTGGACAGGG 3600 

GACTTTCACC CCCAAGCTGC TGAACATGCC CGGCACACAA AGAAGATCTC GGCTCAGTGG 3660 

CCGGGATTAG TTATACAATT ATCTGATTGA TTTTTAATAT ATCTTTTCTT AAATCATCGT 37 2 0 

TAATATCTGA CGGTTCTAGC TGGTTTATAA GTTGCCTTAT TTGGGTAAAG GTACTTTTCT 3780 

GATCTTTTAG ATCTTGTCCT TTTATCGTTG ATAAAGCTGC AATTAGTTCA CCATCGTAAT 38 4 0 
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ATTCACCCGC TAACGGCTCT TTAG7TAGAA CTTCCAACAC TCTTGGCATC AACTGATCAA 3900 

TACATAAATT TTGTCGGATA GCGCGGCAAA GATCTTCCAC TGTTAACTTT TCAAGAGGCA 3 960 

CATCTATGAT ACGTTCGAAC CAGAGTT CAA GCGGTGATTG TTGCTCAGGC TCTTTTGTCA 4 020 

TATTGATGTT TCCAATCAAT TTACGTAAGG TAATCATATT CCATATCCTT TCAAGGCTGA 4 080 

TTCTATTTTA TTAATAGCAT CTGTTGCTCT GCCATACGCA GCCTGAGCTT CAGGATTGTT 4 14 0 

GACGTTTTTC AACGTATCCG CATGATTTCT TAATGCTCTG AGCGTATTTT GCATTTCCTG 4 200 

CATATGATCC CAATATCCTC CATTCTCTTT AGGAACTGGC TTACCATCCA TATGCTTGAG 4260 

AGTTCCAATT AATATCATGA ATCTTTTCAG ANGATTTTTT TAATAGTGGT TAATCGANTC 4 320 

TTCTTTAANT CGGCAACTTT TCTTGGCCTT CCTGGAATTA AAGGCTTTAA TCCTAACAAG 4 380 

TTTTTTTCTC AATTTTTGGC TGGCTTTAGG GAATCAATTT TTCCCGGATT GGGTGGGTGG 4 440 

GTGGTAACCC GGGTTTCCCT TGAAGCCCGG GAAACCCGGC CCCAAGTTCT TACTTTTTTT 4 500 

CCCGCAATCG GGTCAAGAT 4 519 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1213 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATTACAGAAT GTGGAAATTA AGTATGATTC GAAAAAAGAT TCTGATGGCT GCCATCCCCC 60 

TGTTTGTTAT ATCCGGGGCA GACGCTGCTG TTTCGCTGGA CAGAACCCGC GCGGTGTTTG 120 

ACGGGAGTGA GAAGTCAATG ACGCTTGATA TCTCCAATGA TAACAAACAA CTGCCCTATC 180 

TTGCTCAGGC ATGGATAGAA AATGAAAATC AGGAAAAAAT TATTACAGGG CCGGTTATTG 2 40 

CCACCCCTCC GGTTCAGCGC CTTGAGCCGG GTGCGAAAAG CATGGTCAGG CTGAGTACCA 300 

CACCGGATAT CAGTAAACTT CCTCAGGACA GGGAATCACT GTTTTATTTT AATCTCAGGG 360 

AAATACCGCC GAGGAGTGAA AAGGCCAATG TACTGCAGAT AGCCTTACAG ACCAAAATAA 4 20 

AGCTTTTTTA TCGCCCGGCA GCAATTAAAA CCAGACCAAA TGAAGTATGG CAGGACCAGT 4 80 

TAATTCTGAA CAAAGTCAGC GGTGGGTATC GTATTGAAAA CCCAACGCCC TATTATGTCA 54 0 

CTGTTATTGG TCTGGGAGGA AGTGAAAAGC AGGCAGAGGA AGGTGAGTTT GAAACCGTGA 600 

TGCTGTCTCC CCGTTCAGAG CAGACAGTAA AATCGGCAAA TTATAATACC CCTTATCTGT 660 

CTTATATTAA TGACTATGGT GGTCGCCCGG TACTGTCGTT TATCTGTAAT GGTAGCCGTT 720 

GCTCTGTGAA AAAAGAGAAA TAATGTACCG CAATAACGGT TAAATGCGGG TGGGATATTA 780 

TGGTTGTGAA TAAAACAACA GCAGTACTGT ATCTTATTGC ACTGTCGCTG AGTGGTTTCA 84 0 
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TCCATACTTT CCTGCGGGCT GAAGAGCGGG GTATATACGA TGACGTCTTT ACTGCAGATG 900 

AGTTGCGTCA TTACCGGATA AATGAAGGGG GGGGACGCAC CGGAAGCCTG ACCGTCAGTG 9 60 

GTGCACTGCT GTCCTCACCC TGCACGCTGG TGAGTAATGA GGTGCCGTTA ARCCTCCGGC 1020 

CGGAAAATCA CTCTGCGGCA GCCGGAGCAC CTCTGATGCT GAGGCTGGCA GGATGTGGGG 108 0 

ACGGTGGTGC ACTTCAGCCC GGAAAACGGG GCGTTGCGAT GACAGTCTCC GGCTCACTGG 114 0 

TAACCGGTCC CGGAAGCGGA AGTGCTTTAC TTCCTGACCG TAASCTATCG GGCTGTGACA 1200 

TCTTGTTATA CAC 1213 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 451 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

ACGCTCTAGT ATTCTCTGTC GTTCTGCCTG GGCCACTGCA GATAGAATAG TGACAACCAT 60 

TTTACCCATC TCCCCATCGG TACTGATTCC GTCATCAATA AACCGAATGG ATACACCTTG 120 

GGCGTCAAAC TCTTTTATTA ACTGGATCAT GTCAGCAGTA TCGCGCCCAA GGGGTTCAAG 180 

TTTCTTCACC AAGATGACGT CACCTTCCTC CACCTTCATC CTCAGCAAGT CCAGCCCTTT 24 0 

CCGATCGCTT GAACTGCCCG ATGCCTTGTC AGTAAAGATG CGATTTGCTT TCACGCCTGC 300 

GTCTTTGAGT GCCCGAACCT GAATATCGAG AGATTGCTGG CTGGTTGATA CCCGTGCGTA 3 60 

ACCAAAAAGT CGCATAAAAA TGTATCCYAA ATCAAATATC GGACAAGCAG TGTCTGTTAT 4 20 

AACAAAAAAT CGATTTNAAT TAGACACCNT T 4 51 
(2) INFORMATION FOR SEQ I D NO : 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 9: 

GACAAGGCTT ATAAACTCAC TGACGGGGCT GGCATGTTCC TGCTGGTACA TCCTAATGGT 60 

TCCCGTTACT GGCGTCTCCG TTATCGTATT CTGGGTAAGG AGAAGACTCT GGCACTTGGT 120 

GTGTATCCAG AAGTTTCTCT CTCCGAAGCT CGTACAAAAC GGGATGAGGC CCGAAAACTG 180 

ATTTCGGAGG GGATTGACCC TTGCGAACAG AAAAGAGCTA AAAAAGTAGT CCCTGATTTA 24 0 

CAGCTCTCTT TTGAACATAT TGCACGACGC TGGCATGCCA GTAATAAACA ATGGGCACAA 300 
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TCACACAGCG A T AAA G TACT CAAAAGCCTC GAAACACACG TTTTCCCCTT TATCGGCAAC 3 60 

CGGGATATCA CAACACTCAA TACCCCGGAT CTGCTTATCC CTGTTCGTGC TGCAGAAGCT 4 20 

AAACAAATTT ATGAAATCGC CAGTCGTCTG CAGCAAAGAA TATCTGCCGT AATGCGTTAT 4 80 

GCCGTACAGT CTGGCATCAT CAGATATAAT CCTGCTCTGG ATATGGCTGG CGCATTGACT 54 0 

ACGGTAAAAC GCCAGCATCG CCCCGCTCTT GATCTTTCAC GTCTGCCTGA ACTTCTGTCG 600 

CGTATTAACA GTTATAAAGG NCAGCCTGTC ACCCGGCTTG CGTTGATGCT GAATTTACTG 660 

GGTTTTTATT CGTTCCAGTG AACTCAGATA CGCCCGCTGG TTCTGAAAAT TGATATTGGA 720 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2920 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



NCNTTAATTT 


TATATCTCGT 


AAAATAAAAT 


GTTTTCTGTA 


CCGCTCTCCG 


GAGGGGGGAA 


60 


TGATTCGTTT 


ATCATTATTT 


ATATCGTTGC 


TTCTGACATC 


GGTCGCTGTA 


CTGGCTGATG 


120 


TGCAGATTAA 


CATCAGGGGA 


AATGTTTATA 


TCCCCCCATG 


CACCATTAAT 


AACGGGCAGA 


180 


ATATTGTTGT 


TGATTTTGGG 


AATATTAATC 


CTGAGCATGT 


GGACAACTCA 


CGTGGTGAAG 


240 


TCACAAAAAC 


CATAAGCATA 


TCCTGTCCGT 


ATAAGAGTGG 


CTCTCTCTGG 


ATAAAAGTTA 


300 


CGGGAAATAC 


TATGGGAGGA 


GGTCAGAATA 


ATGTACTGGC 


AACAAATATA 


ACTCATTTTG 


360 


GTATAGCGCT 


GTATCAGGGA 


AAAGGAATGT 


CAACACCTCT 


TACATTAGGT 


AATGGTTCAG 


420 


GAAATGGTTA 


CAGAGTTACA 


GCAGGTCTGG 


ACACAGCACG 


TTCAACGTTC 


ACCTTTACTT 


480 


CAGTGCCCTT 


TCGTAATGGC 


AGCGGGATAC 


TGAATGGCGG 


GGATTTCCGG 


ACCACGGCCA 


540 


GTATGAGCAT 


GATTTATAAC 


TGAGTCATAC 


CCAAATGAAT 


AACTGTAATT 


ACGGAAGTGA 


600 


TTTCTGATGA 


AAAAATGGCK 


CCCTGCTTTT 


TTATTTTTAT 


CCCTGTCAGG 


CTGTAATGAT 


660 


GCTCTGGCTG 


CAAACCAGAG 


TACAATGTTT 


TACTCGTTTA 


ATGATAACAT 


TTATCGTCST 


720 


CAACTTAGTG 


TTAAAGTAAC 


CGATATTGTT 


CAATTCATAG 


TGGATATAAA 


CTCCGCATCA 


780 


AGTACGGCAA 


CTTTAAGCTA 


TGTGGCCTGC 


AATGGATTTA 


CCTGGACTCA 


TGRTCTTTAC 


840 


TGGTCTGAGT 


ATTTTGCATG 


GCTGGTTGTT 


CCTAAACATG 


TTTCCTATAA 


TGGATATAAT 


900 


ATATATCTTG 


AACTTCAGTC 


CAGAGGAAGT 


TTTTCACTTG 


ATGCAGAAGA 


TAATGATAAT 


960 


TACTATCTTA 


CCAAGGGATT 


TGCATGGGAT 


GAAGCAAACA 


CATCTGGACA 


GACATGTTTC 


1020 


AATATCGGAG 


AAAAAAGAAG 


TCTGGCATGG 


TCATTTGGTG 


GTGTTACCCT 


GAACGCCAGA 


1080 


TTGCCTGTTG 


ACCTTCCTAA 


GGGGGATTAT 


ACGTTTCCAG 


TTAAGTTCTT 


ACGTGGCATT 


1140 
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CAGCGTAATA 


ATTATGATTA 


TATTGGTGGA 


CGCTACAAAA 


TCCCTTCTTC 


GTTAATGAAA 


1200 


ACATTTCCTT 


TTAATGGTAC 


ATTGAATTTC 


TCAATTAAAA 


ATACCGGAGN 


ATGCCGTCCT 


1260 


TCTGOACAGT 


CTCTGGAAAT 


AAATCATGGT 


GATCTGTCGA 


TTAATAGCGC 


TAATAATCAT 


1320 


TATGCGGCTC 


AGACTCTTTC 


TGTGTCTTGC 


GATGTGCCTA 


CAAATATTCG 


TTTTTTCCTG 


1380 


TTAAGCAATA 


CAAATCCGGC 


ATACAGCCAT 


GGTCAGCAAT 


TTTCGGTTGG 


TCTGGGTCAT 


1440 


GGCTGGGACT 


CCATTATTTC 


GATTAATGGC 


GTGGACAGAG 


GAGAGACAAC 


GATGAGATGG 


1500 


TACAGAGCAG 


GTACACAAAA 


CCTGACCATC 


GCAGTCGCCT 


CTATGGTGAA 


TCTTCAAAGA 


1560 


TACAACCAGG 


AGTACTATCT 


GGTTCAGCAA 


GGCTGCTCAT 


GATATTGCCA 


TAAATGGTTT 


1620 


ATCCGGAGCC 


GGATAGTGTG 


TTGTGGATAT 


CTGGCATGCC 


CCGGGAAGTC 


ACCTTTCAGA 


1680 


CGGGCGGAGG 


GCTGGTGAAT 


TATCCGCGAT 


TACTGAGCAG 


TATGGATAAT 


CCTTTTTCAC 


1740 


AGACTTGTCA 


GCAGCCAGCA 


TTTATGTTCT 


TTTATCTGAG 


GGAATTTATC 


TGTACGCTGT 


1800 


GCCGGGATAT 


CTCAGTTATA 


CAGAAATCAG 


GCAGGAATAA 


ATTGTAGTGG 


AAAGTCGATG 


1860 


TTTACCGGAT 


GACTGATGCG 


CGCTTGTACA 


CAGACAGTGT 


GTTTCAGTAA 


TATGGAGAAT 


1920 


AATGAAATGA 


ATAACACAGA 


CACATTAGAA 


AAAATAATCA 


GACACCAAAA 


AAACAAAGAC 


1980 


CCCGCATATC 


CTTTCGGGAA 


CATTTGTTGA 


TGCAGCTCTG 


TATTCGCACA 


AATAAAAGAA 


2040 


TGCAGGATAA 


TATATCTGAA 


TTTCTGGGGG 


CGTATGGAAT 


AAATCACTCA 


GCATATATGG 


2100 


TCCTCACCAC 


ATTATTCGCA 


GCGGAGAACC 


ATTGTCTGTC 


ACCTTCAGAG 


ATAAGCCAGA 


2160 


AACTTCAGTT 


TACCAGAACT 


AATATTACCC 


GCATTACAGA 


TTTTTTAGAA 


AAAGCCGGAT 


2220 


ATGTAAAAAG 


GACGGATAGC 


AGGGAGGATC 


GCCGTGCTAA 


AAAAATCAGT 


CTGACATCTG 


2280 


AAGGTATGTT 


TTTTATTCAG 


AGGCTCACTC 


TTGCACAAAG 


CATGTATCTG 


AAAGAAATCT 


2340 


GGGATTATCT 


GACCCATGAT 


GAACAGGAAC 


TGTTTGAAGT 


CATTAATAAA 


AAATTACTGG 


2400 


CACATTTTTC 


TGATGCCAGC 


TCATAAAGTG 


CGAAATATCT 


GAGGATGCCG 


GATAGCTTCA 


2460 


GGCAAAATAA 


TAATGATTCT 


TGCAGATGTG 


TTTTTCCGGA 


TACAAAAACA 


AATGATAAAA 


2520 


ATTGCAGCGC 


CAGGCACCTT 


TCAAAGCAGG 


GAGACCTGTA 


CCGCGTCGAA 


AATTTCAGCC 


2580 


AGTTAATATC 


ATTGTCTGAA 


CCAGGCACTT 


TGCCCGGGCA 


GGAGAAGGAG 


TTGTGGCGGT 


2640 


CTCAGCCCGG 


AACAATTTGA 


AAACCATAAT 


CTCGCTTAGG 


GCCGTGTCCA 


CATTACGTGG 


2700 


GTAGGATCAC 


TCCTGGATTT 


TCTCTTTTTG 


GACATTGACG 


TCTCCATTGG 


TTTAAACACG 


2760 


GCAATGGAGA 


CTGCGGTGAA 


AAGAGTTAAT 


TCCCGGAGTG 


ACTGGCTGGA 


TGCCAATCAA 


2820 


TGATCGGAAG 


CATGCCAAAC 


TGTGAACGGA 


GATGGATGCC 


GCCAAATCAT 


GATCGATTCA 


2880 


GATGCCATAT 


TTGCAATATC 


GCGTTAATCG 


TCAGTTCAGC 






2920 


(2) INFORMATION FOR SEQ ID NO: 11: 









(i) SEQUENCE CHARACTERISTICS: 
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( A ) LENGTH: 1678 base pairs 
(D) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GGTAAGGAAG TTATATATAT GAGCAACTAT ACATCTTAGA TGTATGATAA AGAAAAAGAT 60 

AACAGTTCTT TAGAATATGT ATATTGAAGA GAATGCAATA GCATGGTTTA TATAAATTAC 120 

GCATAAAAAT AAGCATATGT AAGCATTTTG GTTTGCTTTT TTTAACCTGC CACCGCAATG 180 

AATGCTTTTT TTATGTTAAT GTGCGTTATG AAACTAAATG CAAGAAACAT ATTTAAAGGA 24 0 

TTAATATCGT TCTCTCACAG ACTCCGTTTA CTTATTCAAG AATATAATTT AATTTATAGT 300 

GAGCTTATTA TGAATATGAA CAATCCATTA GAGGKTCTTG GGCATGTATC CTGGCTCKGG 3 60 

GGCCAGTTCC CCATTACACA GAAACYGGCC AGTTTCTTTG TTTGCAATAA ATGTATTACC 4 20 

TGCAATACGG GGCTAACCAA TATGCTTTAT TAACCCGGGG ATAATTACCC TGTTGCATAT 4 80 

TGTAGTTGGG GCTAATTTAA GTTTAGAAAA TGAAATTAAA TATCCTAATG ATGTTACCTC 54 0 

ATTAGTCGCA GAAGACTGGA CTTCAGGTGA TCGTAAAKGG TYCATTGACT GGATTGCTCC 600 

TTTCGGGGAT AACGGTGCCC TGTACAAATA TATGGGAAAA AAATTCCCTG ATGAACTATT 660 

CCGAGCCATC AGGGTGGATY CCAAAACTCA TGTTGGTAAA GTATCAGAAT TTCACGGAGG 720 

TAAAATTGAT AAACAGTTAG CGAATAAAAT TTTTAAACAA TATCACCACG AGTTAATAAC 780 

TGAAGTAAAA AACAAGACAG ATTTCAATTT TTCATTAACA GGTTAAGAGG TAATTAAATG 84 0 

CCAACAATAA CCACTGCACA AATTAAAAGC ACACTACAGT CTGCAAAGCA ATCCGCTGCA 900 

AATAAATTGC ACTCAGCAGG ACAAAGCACG AAAGATGCAT TAAAAAAAGC AGCAGAGCAA 960 

ACCCGCAATG GGGGAAAACA GACTCATTTT TACTTATCCC T AAAGAT T AT AAAGGACAGG 1020 

GTTCAAGCCT TAATGACCTT GTCAGGACGG CAGATGAACT GGGAATTGAA GTCCAGTATG 108 0 

ATGAAAAGAA TGGCACGGCG ATTACTAAAC AGGTATTCGG CACAGCAGAG AAACTCATTG 114 0 

GCCTCACCGA ACGGGGAGTG ACTATCTTTG CACCACAATT AGACAAATTA CTGCAAAAGT 12 00 

ATCAAAAAGC GGGTAATAAA TTAGGCGGCA GTGCTGAAAA TATAGGTGAT AACTTAGGAA 12 60 

AGGCAGGCAG TGTACTGTCA ACGTTTCAAA ATTTTCTGGG TACTGCACTT TCCTCAATGA 1320 

AAATAGACGA AC T GAT AAA G AAACAAAAAT CTGGTAGCAA TGTCAGTTCT TCTGAACTGG 1380 

CAAAAGCGAG TATTGAGCTA ATCAACCAAC TCGTGGACAC AGCTGCCAGC ATTAATAATA 14 4 0 

ATGTTAACTC ATTTTCTCAA CAACTCAATA AGCTGGGAAG TGTATTATCC AATACAAAGC 1500 

ACCTGAACGG TGTTGGTAAT AAGTTACAGA ATTTACCTAA CCTTGGATAA TATCGGTGCA 156 0 

GGGTTAGATA CTGTATCGGG KATTTTATCT GCGRTTTCAG CAAGCTTCAT TCTGAGSCAT 1620 
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GCAGATGCAG ATACCGGRAC TAAAGCTGCC AGCAGGTGTT GGATTNACCA ACGGAANT 167 8 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2676 base pairs 
(Li) TYPE: nucleic acid 

(C) STRANDEDtlESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AAG GAT TACT TTGGAATCTG AC AAC AAAG T TACTATGAAA AAGAACTAAC AAAGTTATAT 60 

AATGACGCTA AAAATGCTTT GAAAGATGTG CAATCTAAAG CAAATAGGTT AATTTCTGAT 120 

AATAAGANAA AACATAAGAG TGAACTAAAA AACATTTCTT ATGAATTCCA ATCAACTAAT 180 

CTCAATGGCA AAGATACTGC GTATATATTG GATGTARAAA GAAATCTAGA AAGTAAAATT 24 0 

GAGAATACTT CAAACGAATG AGTGTAATGA AATAAGAAAA CTAACCGACC AGATTGCAAT 300 

AATTAGTGAT AGTACCACTT CTGAAAATTT ATCATCGGCT CAAGTAACTG AAGCAATCGA 3 60 

AACTGAACTT GAACATTTAC GAGACCAACA AGCAAATAAC GCAGAGTTAA TACTACTTGG 4 20 

CATGGCTCTT TCTGTAGTAC ATCATGNATT TAATGGTAAT ATTAGGGCAA TTAGAAGTGC 4 80 

GCTAAGGGAA TTAAAAGCAT GGGCTGACAG AAATCCTAAG CTTGATATTA TATACCAAAA 54 0 

AATCAGAACT AGTTTTGATC ACTTAGATGG TTATTTAAAA ACCTTTACAC CATTGACAAG 600 

ACGTTTAAGT CGCTCTMAAA CCAATATAAC TGGAACTGCC ATTTTAGAAT TTATCAGAGA 660 

TGTATTCGAT GATCGTCTTG AGAAAGAAGG AATTGAATTA TTCACTACCT CAAAGTTTGT 720 

TAATCAAGAA ATTGTAACTT AC AC AT C AAC CATTTACCCT GTCTTTATAA ATCTAATTGA 780 

TAACGCAATA TACTGGCTTG GGAAAACAAC TGGAGAAAAA AGACTTATAC TTGATGCKAC 84 0 

TGAAACAGGA TTTGTTATTG GTGATACTGG TCCCGGTGTT TCAACTAGAG ATCGAGATAT 900 

AATATTTGAT ATGGGATTTA CACGAAAAAC AGGAGGGCGT GGAATGGGAT TATTCATTTC 9 60 

CAAAGAGTGT TTATCTCGAG ATGGATTTAC TATAAGATTG GATGATTACA CTCCTGAACA 1020 

GGGTGCTTTC TTTATTATTG AGCCATCAGA AGAAACAAGT GAATAGCGGA TATAAATAAA 1080 

TGACAAGCTC TACTGATTTN CATAAACTTT CTGAAGACTG CGTTCGCCGT TTTTTACATT 114 0 

CTGTAGTTGC TGTAGATGAC AATATGTCTT TTGGAGCTGG TAGTGATACT TTCCCTACAG 1200 

ACGAAGATAT TAATGCTTTA GTTGATCCCG ACGATGATCC TACACCAATA ATAACAGCAT 12 60 

CAGCATCCCC AAGGATAGAA TCAACTAAAT CAAAAGCAAA GGTAAAAAAC CATCCTTTTG 1320 

ATTACCAAGC TCTAGCAGAA GCTTTC3CCA AAGATGGTAT TGCTTGTTGC GGATTATTAG 1380 

CTAAGGAAGG TGCGAATAAG CGGGGAAATT CTTCTCGGCT GACTCAGTCA TTTCATTTCT 14 4 0 

TCATGTTTGA GCCGATTTTT TCTC 2CGTAA ATGCCTTGAA TCAGCCTATT TAGACCGTTT 15 00 
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CTTCGCCATT TAAGGCGTTA TCCC CAGTTT TTAGTGAGAT CTCTCCCACT GACGTATCAT 1560 

TTGGTCCGCC C G AAA C A G G T TGGCC AGCGT GAATAACATC GCCACTTGGT TATCGTTTTT 1620 

CAGCAACCCC TTGTATCTGG CTTTCACGAA GCCGAACTGT CGCTTGATGA TGGG AAATGG lb8 0 

GTGCTCCACC CTGGCCCGGA TGCTGGCTTT CATGTATTCG ATGTTGATGG CCGTTTTGTT 174 0 

CTTGCGTGGA TGCTGTTTCA AGGT7CTTAC CTTGCCGGGG CGCTCGGCGA TCAGCCAGTC 18 00 

CACATCCACC TCGGGCAGGT CCTCGCGCTG TGGCGCCCCT TGGTAGCCGG CATGGGCTGA 18 60 

GACAAATTGC TCCTCTCCAT GCAGCAGATT ACCGAGCTGA TTGAGGTCAT GCTCGTTGGC 192 0 

CGCGGTGGTG ACCAGGGTGT GGGTCAGGCC ACTCTTGGGA TCGACACCAA TGTGGGCCTT 1980 

CATGCCAAAG TGCCACTGAT TGCCTTTCTT GGTCTGATGC ATCTCCGGAT CGCGTTGCTG 204 0 

CTCTTTGTTC TTGGTCGAGC TGGGTGGCTG AATGATGGTG GGATCGACCA AGGTGCCTTG 2100 

AGTCATCATG ACGCCTGCTT CGGCGAGCCA GCGATTGATG GTCTTGAACA ATTGGCGGGC 2160 

CAGTTGATGC TGCTCCAGCA GGTGGCGGAA ATTCATGATG GTGGTGCGGT CCGGCAAGGC 2220 

GCTATCCAGG GATAACCGGG CAAACAGAGG CATGGAGGCG ATTTCGTACA GAGGATCTTC 2 2 80 

CATCGCGCCA TCGCTCAGGT TGTAG GAATG GTGCATGCAG TGAATGCGTA GCATGGTTTC 2 34 0 

CAGCGGATAA GGTCGCCGGC CATTACCAGC CTTGGGGTAA AAGGGCTCGA TGACTTCCAC 2 4 00 

CATGTTTTGC CATGGCAGAA TCTGGTCCAT GCGGGACAAG AAAATCTCTT TTCTGGTCTG 24 60 

ACGGCGCTTA CTGCTGAATT CACTGTCGGC GAAGGTAAGT TGATGACTCA TGATGAACCC 2520 

TGTTCTATGG CTCCAGATGA CAAAGATGAT CTCATATCAG GGACTTGTTC GCAGGTTCCC 258 0 

TAAGAGTTTT AATGTTTGAA GAAAGAGATA TAATTACAGC ATCATCCCAC AAAGGAGATA 2 64 0 

TTACAATACC TTGACTGGGN TATTGCCAAG CGGATA 2 67 6 



(2) INFORMATION FOR SEQ ID NO: 13: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1485 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

AAATTTGTCC TCCGGNTCTT TTCCCGTGGA TACGGGCATT GAGACCCGAA AGGSCCTGTA 60 

TTTGCGACCG GAGAGGCATC CTGGGGGCTC AGTAAACCAG TGGTCGCTGT ATGGCGGGGC 120 

TGTGCTTGCC GGTGATTATA ATGNCACTGG SAGCCGGTGC CGGCTGGGAC CTGGGTGTGC 180 

CGGGGACCCT TTCCGCTGAT ATCACGCAGT CAGTAGCCCG TATTGAGGGA GAGAGAACGT 240 

TTCAGGGAAA ATCCTGGCGT CTGAGCTACT CCAAACGGTT TGATAATGCG GATGCCGACA 300 

TTACGTTCGC CGGGTATCGT TTCTCAGAGC GAAACTATAT GACCATGGAG CAGTACCTGA 360 
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ACGCCCGCTA CCGTAATGAT TACAGCAGTC GGGAAAAAGA GATGTATACC GTTACGCTGA 4 2 0 

ATAAAAACGT GGCGGACTGG AACACCTCTT TTAACCTGCA GTACAGCCGT CAGACATACT 4 80 

GGGACATACG GAAAACGGAC TATTATAGGG TGAGCGTCAA CCGCTACTTT AATGTTTTGG 54 0 

GACTGCAGGG TGTGGCGGTT GGATTGTCAG CCTCAAGGTC TAAATATCTG GGGCGTGATA 600 

ACRRTTCTGC TTACCTGCGT ATATCCGTGC CGCTGGGGAC GGGGACAGCG AGCTACAGTG 660 

GCAGTATGAG TAATGACCGT TATGTGAATA TGGCCGGCTA GACTGACACG TTCAATGACG 7 20 

GTCTGGACAG CTACAGCCTG AACGCCGGCC TTAACAGTGG CGGTGGACTG ACATCGCAAC 7 80 

GTCAGATTAA TGCCTATTAC AGTCATCGTA GTCCGCTGGG AAATTTGTCC GCGAATATTG 84 0 

CATCCCTGCA GAAAGGATAT ACGTCTTTCG GCGTCAGTGC TTCCGGTGGG GCAACAATTA 900 

CCGGAAAAGG TGGGGCGTTA CATGCAGGGG GAATGTCCGG TGGAACACGT CTTCTTGTTG 9 60 

ACACGGATGG TGTGGGAGGT GTACCGGTTG ATGGCGGGCA GGTGGTGACA AATGGCTGGG 1020 

GAACGGGCGT GGTGACTGAC ATCAGCAGTT ATTACCGGAA TACAACCTCT GTTGACCTGA 1080 

AGCGCTTACC GGATGATGTG GAAGCAACCC GTTCTGTTGT GGAATCGGCG CTGACAGAAG 1140 

GTGCCATTGG TTACCGGAAA TTCAGCGTGC TTAAAGGGAA ACGTCTGTTT GCAATACTGC 1200 

GTCTTGCTGA TGGCTCTCAG CCCCCGTTTG GTGCCAGTGT AACCAGTGAA AAAGGCCGGG 12 60 

AACTGGGCAT GGTGGCCGAC GAAGGCCTTG CCTGGCTGAG TGGCGTGACG CCGGGGGAAA 1320 

CCCTGTCGGT AAACTGGGAT GGAAAAATAC AGTGTCAGGT AAATGTACCG GAGACAGCAA 1380 

TATCTGACCA GCAGTTATTG CTTCCCTGTA CGCCTCAGAA ATAAATGAAA GTCCGGAATA 14 40 

TTAACGGCTG ATTGAATTGC GGTTTATGCC ATTTTCCCGG ACCAA 14 85 



(2) INFORMATION FOR SEQ I D NO : 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22671 base pairs 

(B) TYPE: nucleic acid 

(C) ST HANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

TTACCAATTT CATCGTCCGG TACATCCTCC AGAACATCTC GCAATAAACT CTCGTCTGCC 60 

TCATTCCATG CCACACCAGC ATTTGGGAAA CGAGGATCGA TCTCTCTTTC CTTCTTCTCC 120 

TTCTTACTTT GCTCTTTTCG GGATGATACA GATACGACAG AACGTTCTTT TACCGCTGTA 1H0 

ATTGCCATAA CTGCATTGAG CAGAGATCTG CGCTCCACAT CGTTCAGCAT TTTTCCTTCA 24 0 

CAGATCAAAT CATTCAGGAT GTCAATGACT AGATTCAGAC TTTCTTCTGT TAGCTTCATA 300 

TTTCAGACCT TGAAGTATGT AGATAATCAG CACAATTACT AATGTGATAA ATATCAGAAG 360 

ATAATTTACA GGTAAACCGG AAAATACATC TGAAGAATAA AGGCCTCAGC TTAACGTTTC 4 20 



WO 98/22575 PCT/IJS97/21347 



-104- 

AGCCAGTTTG TGAGCTGATT GAGGTACGGC GATGACATTA ACGGGAATTA CTCCCrTATA 4 80 

GCTCTGAGCT TATTTTTCAC CCTGGCAACA TATGGTGGCT ACTGCGCATG GTTTTGGAGT 54 0 

AGATATCTTA CTACTCGTAG AATTGTGCTT ACTGGTCAGG CCAGCGCACA GGCATTCCGT 600 

GCAATCAATA GAACACTGGT TTTTTAGTCT TGCGTTAGGC ATCAGGATGT TAGTGCAGAT 6 60 

TCGGGTGTAT TCGATCAGTT GTTCGGCGAA TCAGCGATCG ATCAGGATGC GATTTCGTAT 720 

GTTAGGGATG CTGGTATGAT TACTCGCTGA AAAATAATGT GAAAAGGCAG TTTTTGTTTA 7 80 

GACATTTAGC TCATTCATGC TGTTGTTTTA CGTTTTGCTG TCGTGTGCAG GATTATCTTT 84 0 

TCGTTAGGGG ACGATTCATT GCGTTTTAAT CAGGAGCTAT TGGCGTTGCT CATTGGTGGG 900 

ATGCCGTAAA GTTTTACCGC GGCGATTAAT GATGTGAAGT CAATCCAAAT CAACGGAGAT 960 

CTCTCATCAT GAATCAACCA ATACACAATG ATTACTGGTT ATCCCGTTTT GAAAGTATTC 1020 

TCAACAGTGC CCTGGTGCAA CACCGTGCCG TCTCGTTAAT CTGGGTGGAT TTACGTTTCC 108 0 

CTGAGCATAT GCCTGTCACC ATCATGGATC CCGATCCGGA TTCAGCGGTG ATTTGTCGTT 114 0 

TTTTCGAATC CCTGAAAGGC AAAATTCAGG CTTACCAGCG GAAAAAACGA CGTACCAACA 1200 

AGCGTGTGCG TGCAACCACC CTGCATTATT TCTGGTGTCG GGAGTTTGGC AAGGAAAAAG 12 60 

GCAGGAAACA TTATCACGTG ATATTACTGC TCAACAAAGA TACCTGGTGC TCGGCAGGGG 1320 

ATTTCACCGT TCCTTCTTCG CTGGCGACGC TGATCCAACT GGCATGGTGT AGCGCTCTGC 1380 

ATCTTGAGCC CTGGCAGGGT AATGGACTGG TTCATTTTTC CAGGCGGACG CYTTTGCGTA 144 0 

AACCGGTATC ATCTGATGCT CGCCCTTCTT CCGATGATAC GCCTTTGTCG GGTGGATGTT 1500 

CTGAAACCAG GAAGGCTTCA GACAAAAAGC CGGGTGAAGC CGCTGTTCTG TGGATCAAGC 1560 

GTGGTGATGT GGAAGCGATG CAGAAAGCCA TGGAGAGAGC CCGTTATCTC GTGAAGTATG 1620 

AGACGAAGCA GCATGACGGT TCTGGTCAAC GTAATTATGG TTGCAGCCGT GGAGCGGGGC 168 0 

GTCTACTGGA TGGCAGGTGA ACCCTGTAAA ACGGCATCCG GTGCCAGAGT ATATGTCACA 17 4 0 

GTAAGGGCGT GGTTGATGCC CTTAGCTCGT TTTCTGAAAA AGTCGTCCTG AAGTGATGTG 1800 

TCACGAACGG TGCAATAGTG ATCCACACCC AACGCGTGAA ATCAGATCGA GGGGGTAATC 18 60 

TGCTCTCCTG ATTCAGGAGA GYTTATGGTC ACTTTTGAGA CAGTTATGGA AATTAAAATC 1920 

CTGCACAAGC AGGGAATGAG TAGCCGGGCG ATTGCCAGAG AACTGGGGAT CTCCCGCAAT 1980 

ACGGTTAAAC GTTATTTGCA GGCAAAATCT GAGGCGCCAA AATATACGCC GCGACCTGCT 2 04 0 

GTTGCTTCAC TCCTGGATGA ATACCGGGAT TATATTCGTG AACGCATCGC CGATGCTCAT 2100 

CCTTACAAAA TCCCGGCAAC GGTAATCGGT CGAGAGATCA GAGACCAGGG ATATCGTGGC 2160 

GGAATGACCA TTCTCAGGGC ATTCATTCGT TCTCTCTCGG TTCCTCAGGA GCAGGAGCCT 2220 

GCGGTTCGGT TCGAAACTGA ACCCGGACGA CAGATGCAGG TTGACTGGGG CACTATGCGT 2280 

AATGGTCGCT CACCGCTTCA CGTGTTCGTT GCTGTTGTCG GATACAGCCG AATGCTGTAC 234 0 
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ATCGAATTCA 


:tgacaatat 


GCGTTATGAC 


AGGCTGGAGA 


CCTGCCATCG 


TAATGCGTTC 


2400 


CGCTTCTTTG 


3TGGTGTGCC 


GCGCGAAGTG 


TTGTATGACA 


ATATGAAAAC 


TGTGGTTCTG 


24 60 


CAACGTGACG 


CATATCAGAC 


CGGTCAGCAC 


CGGTTCCATC 


CTTCGTTGTG 


GCAGTTCGGC 


2520 


AAGGAGATGG 


GCTTCTCTCC 


CCGACTGTGT 


CGCCCCTTCA 


GGGCACAGAC 


TAAAGGTAAG 


2580 


GTGGAACGGA 


TGGTGCAGTA 


CACCCGTAAC 


AGTTTTTACA 


TGCCACTAAT 


GACTCGCCTG 


2640 


CGACCGATGG 


GGATCACTGT 


CGATGTTGAA 


ACAGCCAGCC 


GCCACGGTCT 


GCGCTGGCTG 


2700 


CACGATGTCG 


CTAACCAACG 


AAAGCATGAA 


ACAATCCAGG 


CCCGTCCCTG 


CGATCGCTGG 


2760 


CTCGAAGAGC 


AGCAGTCCAT 


GCTGGCACTG 


CCTCCGGAGA 


AAAAAGAGTA 


TGACGTGCAT 


2820 


CCTGGTGAAA 


ATCTGGTGAA 


CTTCGACAAA 


CACCCCCTGC 


ATGATCCACT 


CTCCATTTAC 


2880 


GACTCATTCT 


GCAGAGGAGT 


GGCGTGATGA 


TGGAACTGCA 


ACATCAACGA 


CTGATGGCGC 


2940 


TCGCCGGGCA 


GTTGCAACTG 


GAAAGCCTTA 


TAAGCGCAGC 


GCCTGCGCTG 


TCACAACAGG 


3000 


CAGTAGACCA 


GGAATGGAGT 


TATATGGACT 


TCCTGGAGCA 


TCTGCTTCAT 


GAAGAAAAAC 


3060 


TGGCACGTCA 


TCAACGTAAA 


CAGGCGATGT 


ATACCCGAAT 


GGCAGCCTTC 


CCGGCGGTGA 


3120 


AAACGTTCGA 


AGAGTATGAC 


TTCACATTCG 


CCACCGGAGC 


ACCGCAGAAG 


CAACTCCAGT 


3180 


CGTTACGCTC 


ACTGAGCTTC 


ATAGAACGTA 


ATGAAAATAT 


CGTATTACTG 


GGACCATCAG 


3240 


GTGTGGGGAA 


AACCCATCTG 


GCAATAGCGA 


TGGGCTATGA 


AGGAGTCCGT 


GCAGGTATCA 


3300 


AAGTTCGCTT 


CACAACAGCA 


GCAGATCTGT 


TACTTCAGTT 


ATCTACGGCA 


CAACGTCAGG 


3360 


GCCGTTATAA 


AACGACGCTT 


GAGCGTGGAG 


TAATGGCCCC 


CCGCCTGGTC 


ATCATTGATG 


3420 


AAATAGGCTA 


TCTGCCGTTC 


AGTCAGGAAG 


AAGCAAAACT 


GTTCTTCCAG 


GTCATTGCTA 


3480 


AACGTTACGA 


AAAGAGCGCA 


ATGATCCTGA 


CATCCAATCT 


GCGGTTCGGG 


CAGTGGGATC 


3540 


AAACGTTCGC 


CGGTGATGCA 


GCCCTGACGT 


CAGCGATGGT 


GGACCGTATC 


TTACACCACT 


3600 


CACATGTCGT 


TCAAATCAAA 


GGAGAAAGCT 


ATCGACTCAG 


AGAGAAACGA 


AAGGCCGGGG 


3660 


TTATAGCAGA 


AGCTAATCCT 


GAGTAAAACG 


GTGGATCAAT 


ATTGGGCCGT 


TGGTGGAGAT 


3720 


ATAAGTGGAT 


CACTTTTCAT 


CCGTCGTTGA 


CATCATGCAA 


TGTTTCCTGG 


TTTTCATGCA 


3780 


TCCATCATTT 


GTCGCTGCGA 


TGCCAGACTT 


CTGGATGGAC 


ACATGTTGTT 


TTACTTTTGT 


38 4 0 


CAGCATCATA 


AATGCGCCGG 


GACTGGTGAA 


TGGAGATAAG 


CCATTTTATT 


ATCGACGTCA 


3900 


GCGAACATAC 


TCACCATGCC 


GGTATGTTCC 


TGAACTGAAC 


AATAAGTTTT 


GCGCTGATTA 


3960 


CAGTATGTGA 


AGGAGGTCCG 


TTACAATGAA 


TTCCGCTTAT 


ATGCAATCCT 


TGCAGACATC 


4020 


CCACCACTTC 


CCAGCTGATT 


TAAGCTACAG 


ATTATTTCCT 


AGTGAGCTTG 


CATATCTGAT 


4 08 0 


TGACGACTTA 


TATGAAAGTA 


CCCAACTTCC 


GCTGGAGCTC 


ATTTTTAATA 


CTGTACTGGG 


4140 


AACGCTCTCA 


CTCTCCTGTC 


AGTCACTGGT 


TGAGGTTGTT 


CATCCTCACA 


GCAACATGCC 


4200 


GGAACCCTGC 


TCACTTTATC 


TGTTGGCAAT 


CGCAGAGCCA 


GGCGCGGGAA 


AAACAACGAT 


4260 
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AAA C AG AC TG GTGATGAACC 
AG AG AG AAA C AAAGATTATA 
TGCTGCCAAT TTAAGAAAGG 
GCTGCGTAAT CACGAAAGAA 
AGATGTTTCG CTTAAAGCGC 
TTCTGACGAG GCGGTCACTT 
TAAAGCATGG AGTGGACAAC 
GCCACGTCTG ACATTTTCGT 
AAATGACGTA GTGGCGTGGG 
AAGTCCTTCC CGGGTACGGG 
GAAGTTTCAT AAAAAGATTA 
GAGCACCGAA AGGAAAACAT 
CCAGATTAAG ATTGAAAGAA 
TGTTCTGAAA GCAGGTTCTA 
TAAAGATGCT GAGGAAATTG 
GTATCTGGAG GAGGCGAGCA 
GGATGCCTGT GAACTGTATG 
TATCAGGAAA ACAGACATTG 
TACACCTGTA CTCAATCAGT 
CGCATCAGGC ACTTTATGTT 
CAATGTCTTA CGAACCGTTT 
ATTCCGTTGT TATTCCACCG 
CCTTATTTTA AAACAATTTT 
TAATGATAAG GTGTGTTGCA 
GAATGAAAAT TTCTCTGAAA 
TCTGTGATTT GGTTTTGTTT 
GTGCGCACTG GTGGAAGAGT 
GTTTTAAACA TGGATTAAAG 
TTGTTTGATT CTTGCCGTGA 
TAACAAATTA TTATTCGTCT 
TGAAGAAAAT TAATTTGATA 
AACGGCGGAT TTCTCATTTC 



-106- 

CCTGTTACGA ATTTGCCGAT CGACTCATTC AACAATACGA 4 320 

AGACTGAACT ACAGATCTGG AATACCCGGC AG AAAG C G C T 4 380 

CTGTTAACCG GGGGTATCCG GGGGAACAGG AAGAAGAGGC 4 4 40 

ATAAACCGAC ACGTCCGGTT CGACCGAATT TTATCTATGA 4 500 

TTGTGGAAGG GCTCAATGAA CATCCTGAGG CAGGGGTTAT 4 560 

TTTTCAGAAG CTATCTGAAA AATTATCCGG GCCTGTTGAA 4 620 

CGTTTGATTT TGGACGGGCT GACGAGAAAT ACCATATCAC 4 680 

TAATGTCCCA GCCGGATGTC TTTACGAATT ATATAAATAA 4 74 0 

GAAGCGGATT TCTTTCCCGG TTTCTGTTCA GTCAGACCGG 4 8 00 

ATTATACGAG AGGCGAGTTC AGAACAAAAC CAACCCTGGA 4 8 60 

ACGGATTTCT GTTAAGCCAT AACATTAATT CCCCCGGTAT 4 920 

TAAAACTTGC AAAGAAAGCG TTGGGGGAGT GGCAGGAAAA 4 980 

AAGCGCTTGC AGGAGGGGAG TGGGAACACA TCAGAGATAT 50 4 0 

ATATACTGAG GATAGCTGGA ATATTCACCT GCTATTGCTA 5100 

AATCAATTGC GCTTTTTAAA GCTATGCATC TCATGGGCTG 5160 

CAATATTTTA TCCCATGTCT GCACGATGCC AGTTTGAACA 5220 

CATGGATTAT GACCCGAATA AGGCAGAATA ATTGGCGTGC 5280 

AAAGATATGG TCCCAATCGT CTGAGAAGAG CAGAAAAACT 5 34 0 

TAATCGYTCA GAATTATTTC CGTATCATCM AAGATGCGAT 5400 

TCTGCTCTTG ATAATAATGG TTACATCCTT CCTTTCGGCG 54 60 

GATATTGTTC CACCCCAGTA TAACCATAAT GCGAAAACAT 5520 

GCATTAATTC AGTCATTTAC ACCTGATTCC TCAGCTTACA 5580 

GTGAGTAGAA AACGAAAATC ATAATCCTTC GAATGAAGGT 5 64 0 

TATCCTGCAC CTGTGCAAAT ATTCACCAAT CATTGGGTGT 5700 

AAATCGCTAT GGTAGCAACA GTAGCAGCAC ATACACTACA 57 60 

TCATAATGAC CTGCTGTCAG AGCTGATTGA ATGCTGGGAT 5820 

GGTTTTCGTT TCAGATATAA CGAAAGGTAA TCGAAAGATT 5880 

CTAATAATTA ACCATATTGT GTGAGTTTTT ATATATAAGT 5 94 0 

TGAGTGCTGG GGTATATGAC GATGTCGCTC TCTTTCTGAA 6000 

GTTAGTGATA AGGGATGCGA TTCATGTTTT AATAGAGGGT 60 GO 

TTTTTTTGTA AGGGAATGGA ACTGTCCGGA ATATGTTCAG 6120 

CATTCATTAA ACATGGATAA TTTTAATTTA GGTTTATTAC 6180 
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TATTATTATA 


CTCACTCCCT 


TTTTCATACA 


ATCTCTATTG 


TTATTTACTT 


CCTGTCTTTA 


6240 


CTCACTCTCT 


ATCTTTACGA 


TTATATTCAC 


TCTATCGTTA 


CACATTCCAT 


TAGTATTACT 


•5300 


CTTGTTATCG 


TATTCATTCC 


ATCCCTCAAT 


CATATTTACT 


GTAACTCATA 


TGATGTTCAG 


6360 


GTAAGTTATT 


CTCTACCATT 


CTACTGATGA 


TATCCATCTG 


TTCTCATTTT 


CAGTGAAACA 


6420 


GCAATTGATT 


TTAATCTTAT 


CCATCATGAA 


CTGTATTTGC 


TTAACAATGA 


TTGTTTATCT 


6480 


GAAGTGTTTT 


AACTATTCTG 


GTTGGAAACA 


ATTTCTCTGT 


CATCACAGAT 


TAACTGAATG 


6540 


TTTACTCTTT 


GATAAGGTAT 


CCATGATTCC 


GTCATGTTTA 


ACAGCGCAGG 


ATAAACAACA 


6600 


GAATTAACAG 


AGTGAATTTC 


TGATTATATT 


TGTTGCCGGT 


TGTATTGTTT 


AAGGTACTGG 


6660 


GTGAAAATTA 


TTCATCCATG 


GTATGTTGTC 


TTATGCTATC 


GTGTGTCGTT 


AAC G T T CAT A 


6720 


TCCTGGAGAA 


CAGATTGAAT 


GAGCGCATAT 


AAGTTTATTG 


CATTGGCCTT 


GTACACGGTT 


6780 


TTTACAACCA 


CTGAGAGCAA 


GTTTGTAGTT 


TATGATGTGA 


TTGGTCGCAA 


TATGTTTCTT 


6840 


AACCTTCTGG 


TCGTGGTGTT 


TTATGGCGTA 


TTTTGCAGTA 


TTTCGTGATG 


TTTTATTGAG 


6900 


TCTGTATTTT 


CTTTACTGCT 


CGTTTATCTC 


ATCTCTTTAG 


CTAATACCAT 


CAGATAATCC 


6960 


ATTTCTTTCT 


GCATAATGCT 


GCGTATCGTT 


AATAACCCGT 


CGTATCCATT 


CTGCTACAGC 


7020 


ATGCCTGATA 


AATACCATCT 


GTAAGTTATT 


ACCGTTTTAG 


ATCTGATTAT 


GAGCGAAAGC 


7080 


ATTAATTCGT 


TCACAGAGCT 


TAAAACATCA 


TTAACTTTCA 


GGAGTCATCA 


ACATGCCTAA 


7140 


ATCTTACACA 


CCAAACTGGT 


TTTTTACCGC 


TTTACTTGAC 


AATCACATCA 


ATCAAATGAT 


7200 


GGCACGCTAT 


TCCTGCCTGC 


GGGCCTTACG 


CATGGATTTC 


TTCTACAGGA 


AAGATACGCC 


7260 


CGATTTCTTA 


CAACCTGATC 


ATCGCTGGCT 


TGAATTGCAG 


TTGCGTATGA 


TGCTGGAGCA 


7320 


GGTGGAACAA 


TTTGAAAATA 


TCGTTGGCTT 


CTTCTGGGTG 


ATTGAATGGA 


CGGCTGATCA 


7380 


TGGTTTTCAT 


GCGCATGCGG 


TTTTCTGGAT 


CGATCGTCAG 


AGGGTTAAAA 


AAATATATCC 


7440 


CTTTGCGGAG 


CGGATTACGG 


AATGCTGGCG 


GTCTATTACG 


CATAACAGCG 


GTTCGGCACA 


7500 


CCGCTGCACA 


TATCAGCCGC 


ATT AT AC AT A 


CAACATCAAC 


ATTCCTGTGC 


GCCACAACGA 


7560 


TCCTGAAAGC 


ATCGATAATA 


TTCGCGGTGC 


CCTGCATTAT 


CTGGCGAAAG 


AAGAGCAAAA 


7620 


AGACGGGCTG 


TGTGCTTACG 


GCTGCAATGA 


AGTTCCTGAA 


CGTCCTGCTG 


CAGGGCGTCC 


7680 


TCGTAAGCCT 


CACTTCTGAA 


GCTTAAGGCC 


TGAGCCTTCG 


CTCCTGGAAA 


CACTCCGTCG 


7740 








GAAPTGAAGT 


CAACGGAGAT 


CATTCATCCT 


7800 


GAACCTGCAT 


CGGGTGTTTT 


GTTCCTTGTC 


TTCCCGTTCT 


GCTTCGGTTC 


TTCACTTATT 


7860 


CCATCAATCT 


CATTCCGCAA 


GCCATAACAC 


GTCAGCTCAT 


TCACGGGCAG 


GACGCATTGT 


7920 


GGGCTGCGCA 


TAACGGAACA 


TATCTTATGA 


ATGCTATTCC 


TTATTTCGAC 


TATAGCCTGG 


7980 


CACCCTTCTG 


GCCATCTTAT 


CAGAACAAAG 


TCATCGGCGT 


CCTTGAGCGT 


GCGCTGCGTG 


8040 


AGCAGTCCGG 


CTCACGGATA 


CGGCGGATCC 


TGCTTCGTCT 


GCCGTGGGAA 


CATGACAACG 


8100 
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CCTTCAGCAG CAGAAAGATC TGGT7CGGTA TGGACTTTAT CGAAACCGTC AGTGCGCTGA 8 100 

TGAATGCGAA ACCCGGACGC GAGCTTTGCT GGCTCCTGAC CCGTCATCCG GAAAAGCCGG 8220 

AATACCACGT GGTGCTGTGG GTCAGACAGG AGTATTTCGA GGGCCCCGAA CTGGATCGGT 828 0 

TGATAGTGGA TGCCTGGAGT AATGTGCTGG GTTTCGCGTC ACCAGGTGAA GGAAAGCCGT 8 3-3 0 

ACCAGAAGCA GATCACGCGG GATGTGGTAC TGGATCGCCG GTCACCGGAC TGCGAAGCGG 84 00 

TGTTTAAGGA CCTTATCTGG GCGTTCAGTG ATTTCGCCCG CGATCGCCGT GGAGTGTGCG 84 60 

ATCCGGAAGC CCGTTGCCTT GCCGGCAATC CCGGTTGGCA GTGGTGAAAG CAGCAGGCCA 8520 

TCCCATCCCC CGTATTACCG CATTCTTCAT AAATCTCACT GAGGACATTC TGAGGATGTT 8580 

GACCACAACA AGCCACGACA GCGTATTGCT GCGTGCCGAC GATCCCCTGA TCGAGATGAA 8 64 0 

CTACATCACC AGTTTCACCG GCATGACCGA TAAATGGTTT TAGAGGCTGA TGAGTGAAGG 8 7 00 

GCATTTTCGT AAACCCATCA AACTGGGGCG CAGCAGCCGC TGGTACAAAA GTGAAGTGGA 8 7 60 

GCAGTGGATG CAACAACGAA TTGAGGAATC ACGAGGAGCA GCAGCATGAA ACGTGTTGTG 8 8 20 

ATGCCAGTAC GTTGGCAATG TGCAAAATGC CAGCGCTGGT ATTGTGGAAA TCAGCCCTGT 8880 

CCCTGGTGCT GGCGACATTC CCGCTTATCT TTCCGCTGAC ACCCTCCGGT CAGGCAACTG 8 94 0 

TTAGTCATCA TTTCCTGACT GATTCGTCAT TCCATTCTTA TTGATTATAA CTGGCATTAC 9000 

ACCGGTGCTG GCGTGCTTTC CTGCGTGTCT GCACCGGTTT GAGAAAATTC AACAGGGTTT 9060 

GAAAAGGAAC ATTTCGTGCA AATAACCGAA GCCTTAATTT CAGAGCCGGG AGACATCCGG 9120 

CGTTTTATTC AACATGCTGT TGACCACTGG CCGCGTCTGC TGGCAGTCCA CTTCATACTC 9180 

CATTCGACAG AAGGAAACAT CTACGGGCAA CAGATTCATG CATTCTGCAC TTCCTTTTAT 924 0 

CGACAACTGC ATGAACGTAT TACTGAGAGC AATCACACTG CCAGTCCATC ATCGTCGGTG 9300 

GTATTACGCT GGTTGCGGGA ACAACATGGA GGAGCAACAA TTCGATGCCT GTTGGTGCTC 9360 

AGCCAGACGA GTATTTGTCA CCCGCGAGCC AGTGTCACAG TTGATGAACA ATGTTCGCAA 94 2 0 

GTGGTGGATT TACTGCAACA TAGCTGGCAG GTGATAAGTG CTGGCGGACA ATGCCGGGTG 94 8 0 

GAAAGGTGTT TTCGGGTTGC CCGGGGTGAT ACATCCGGTC AGTATGTTGG GTTAAAAACA 954 0 

GTCGCATTGT CTCTGGGGTT ACCGGTTGTG ACCGCGATTA CGGATCGTGC GGTACAGCGC 9600 

TGTACATTGA TTACAGCTCA GTGAATCAGC GCTTTGTGGC TTTTCGTCGG TCATTCTGTC 9660 

AACGCCACGA TGTTTGACCG TTATGGGGAT GCGGACGATT CGGTGCACAG CGTTGTTTCA 9720 

CGGTGGTGGA TGACGCAACA CCGCTGTTAA AAACAGTCGT TCAGTCGTTT GTGTTACCGG 9780 

TTGTGACAAG AATCAGTTGG TAATGGACGT GTGAACCATC TGCGCTTCCG TTGATTTTTA 98 4 0 

TGGACTGATA AAGTTTTGGC AGCTGAATCT TTATAGGGAA TGCTCTTCAG TATGCGTACA 9900 

CGAATTGACT ATCTGGCGGA TAAATACTCT TTTAGCGAAC GGAATGAATC TCCACGCGTT 9960 

CGCCGGCAGT GGCAGGATGT TCTGGAGGAG TGTCGGGTGA CAGAGGCCGG ACCAGAAGAA 1002 0 

iNSDOCID <WO 9P 575A2J > 
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CGGCTGCGTA 


TTGCCCTGCT 


GAATGTGGAT 


TACGTCACCA 


GTTTTGAACT 


GCCTTTTCGC 


10080 


TTGTTGCTTA 


CTCGTACACC 


ACAACTGATT 


GCCGCGCTTC 


GGGAAGAATG 


GGGCCTCAGC 


10140 


CAGAAAAATG 


TGGTGTTCAA 


CGATAAACGG 


TTTGGCTGCG 


TGTACAGCCT 


GAAGGCCAGT 


10200 


CTTTCTGGTG 


TACCGGATAC 


ATTCCGGTAT 


CATCTGTCTC 


ATCGTATTCG 


CCGGATGGTT 


10260 


GGGAATGAAA 


ATACATCATG 


GCCATATCAG 


CAGATTGCCC 


GGGAAGTGAA 


AGTGCCCCGT 


10320 


GAACGGCTGA 


AGTATGCGCT 


GGAAGCCGGT 


TTACTGGTGA 


CTGCACTGGA 


CGGGCTGTTC 


10380 


TGGTCTGGTA 


GTCAGCGCAT 


TGCGGCTGAT 


ATCCTGAGAC 


TGAGAAAGAG 


CGGAATGCCG 


10440 


GTGGTGACAA 


CGTCCGTGGA 


AGCGAGCGAT 


AACCTGACGG 


GAACAACCCG 


CAAAATACCG 


10500 


GCATACCATC 


TCTGACATTG 


CGATGAAGGG 


CAGATTTCAC 


CTTGACAGGG 


GCAGAGTGCC 


10560 


GCTTTTTATA 


CTTTATTCCC 


GTGTCTGAAA 


AAAATGTGCA 


AAGGAAACGG 


GAATGGCAAG 


10620 


GTCCGATTAC 


GATTTTATCA 


ATCTGTCTCT 


GGGACATGAA 


CTGAATGAGT 


GGCTGGCAGA 


10680 


GAGAGGTTAT 


GCCGGACAGG 


CGGATAACCG 


GAACCGACTG 


GCAGAGGTGG 


TTACCCGCAA 


10740 


ATTGCGGGAC 


AGTTTTTATG 


CGGACGTCTC 


CTGGGATGCG 


CTGAATGTGG 


CATACAGTGA 


10800 


ACACCCTGAG 


TGGTTTTCAG 


AGCTTGCCTC 


CGGGGATGAG 


GATTAACAGG 


CAAATTATGC 


10860 


TGCTATCGGG 


CAGAGTGATT 


ACCTGCAGGG 


ATTTCCATTT 


ATAAGAATAC 


GCCGCTTCGG 


10920 


GAAAGCTCCG 


GTTGTCCGGA 


GAGTTACGAT 


TATTTTTACT 


CAAATTCACA 


ACACCTGAAC 


10980 


TGGAACTTGC 


GTTGTGTCCC 


GGATTGTTAC 


TCCGCAGAAG 


CATCCTTTTT 


ACCATACGGA 


11040 


TGTTTGTTTT 


CCATTTCCCC 


TCCGAAAAAT 


ACAACTCCGA 


TCACATTTCT 


GATATTTTCC 


11100 


CCGGATTTTA 


CATAACAGGA 


TTGTTTCTGT 


ATGTTTTTTA 


TCTGGTGTAA 


ATTTCAGCAC 


11160 


TGACATTCCG 


CTTACGTTAA 


TTTACACTGG 


ATACCCCACG 


AGGAGAATAT 


GCAGCACCGG 


11220 


CAGGATAACT 


TACTGGCGAA 


CAGAAATTTG 


TTGCCTGGTA 


TGGTTTCCGG 


TCAGTACGCA 


11280 


TTCAGGATCC 


GTACCTTATC 


TCAGGTGGTA 


CGCTATTTTT 


CCCTCCTCCC 


CTGCCTTTGC 


11340 


ATTCTTTCAT 


TTTCGTCTCC 


GGCAGCCATG 


CTGTCTCCGG 


GTGACCGCAG 


TGCAATTCAG 


11400 


CAGCAACAGC 


AGCAGTTGTT 


GGATGAAAAC 


CAGCGCCAGC 


GTGATGCGCT 


GGAGCGCAGT 


11460 


GCGCCGCTGA 


CCATCACGCC 


GTCTCCGGAA 


ACGTCTGCCG 


GTACTGAAGG 


TCCCTGCTTT 


11520 


ACGGTGTCAC 


GCATTGTTGT 


CAGTGGGGCC 


ACCCGACTGA 


CGTCTGCAGA 


AACCGACAGA 


11580 




Lb I obu i ortn 


TPAGTGTCTG 


AATATCACGG 


GACTGACCGC 


GGTCACGGAT 


11640 


GCCGTGACGG 


ACGGCTATAT 


ACGCCGGGGA 


TATATCACCA 


GCCGGGCCTT 


TCTGACAGAG 


11700 


CAGGACCTTT 


CAGGGGGCGT 


ACTGCACATA 


AGGGTCATGG 


AAGGCAGGCT 


GCAGCAAATC 


11760 


CGGGCGGAAG 


GCGCTGACCT 


TCCTGCCCGC 


ACCCTGAAGA 


TGGTTTTCCC 


GGGAATGGAG 


11620 


GGGAAGGTTC 


TGAACTGCGG 


GATATTGAGC 


AGGGGATGGA 


GCAGATTAAT 


CGTCTGCGTA 


11880 


CGGAGCCGGT 


ACAGATTGAA 


ATATCGCCCG 


GTGACCGTGA 


GGGATGGTCG 


GTGGTGACAC 


11940 
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TGACGGCATT 
AGAATACCGG 
CTGACAACTG 
GGAATTTTGC 
CATGGAGTGA 
ACCTGCAGAC 
CAGCACTGAC 
TTCAGGGCAG 
TTCTGGGTGG 
CAGAAAGCGA 
TGAGTGCCAG 
CCCAGTGGTC 
CAGTGCGTGG 
AGCTGTCCTG 
TGGACGGCGG 
GTGCTGCTGC 
TGCCTCTGGT 
CCGTCGCGTT 
TGCTGAGTTA 
CCGTCATCAC 
TGAACATTGC 
TCGGGAAGGA 
GTGGACTGAT 
ACGAAGTGAC 
CGGCGAATGT 
ACACGCCGCA 
AGGCGCTGGA 
GGAGCGATGC 
AGGATTTAAC 
TGAAGGGCGA 
ACGCCAGGCG 
TTTATGCCCG 



GCCGGAATGG 
TACGGGGCAG 
GTTTGTCAGC 
CGCCGGTGTG 
CTACCTCAGC 
TCACCGGCTG 
CGGAGGTCTG 
CAGCCGTAAA 
TGTCGGAACA 
CCACGGGAAA 
TTTTCAGCGC 
ACCGGACCGT 
CTTTAAGGAG 
GTCTCTGTTC 
CTGGCTGCAC 
CGGGCTCAGC 
TTACCCGGAC 
TTAAGGGATT 
CCTTATCAGT 
CCCACAAAAC 
CACGCCGAAC 
AGGGCTGATT 
ACAGAATAAC 
CGGCGGTAAC 
GATGGTTGCC 
CGCGACGCTC 
GGTGACTGAA 
CGTATCCATT 
TGTCACTGCA 
AGGTGATGTG 
TATTCATCTG 
CGAGGGCGAT 



CCTGTCACAG 
TTAAATGGTG 
GGGGGAGGGA 
AGTCTGCCGT 
ACCATTGATA 
GGACTGTCGC 
CAGCACCGCA 
CTCACTTCAT 
CTGAATCCGG 
AGGGGAGACC 
CCCGTCACGG 
CTTCATGGTG 
CAGTATATCT 
TCCCTGCCAT 
TCTGACAGAG 
ACCACCAGTG 
TGGCTTGCCC 
ATTACCATGC 
ACGATTATCG 
GGGGCTGGAA 
GGGGCCGGGA 
CTCAATAATG 
CCGAACCTGA 
CGTTCACTGT 
AACCCGTATG 
ACCACAGGCA 
GGCAGTATCA 
ATTGCCCGTG 
GGCGCTAACC 
CCGAAAGTTG 
ACCTCCACTG 
ATCATACTGA 



-110- 

GGAGCGTGGG 
TCCTTTCCTT 
GCAGTGACTT 
ATGGCTATAC 
ACCGG 3GCTG 
ATGTCCTGTT 
TTATTCACAA 
TTTCTGTCGG 
TATTCACACG 
TGCCCGTAAA 
ACAGGGTGTG 
TGGAACAACT 
CCGGTAATAA 
ATGTGGGGAC 
ATGACCCGTA 
GTCATGTTTC 
CTGACCATCT 
ATCAGCCTCC 
CCGGGCAGCG 
TGGATAAAGC 
TTTCGCATAA 
CCACCGGTAA 
AAGCGGGCGG 
TGCAGGGGTA 
GTATGACCTG 
AACCTGTGAT 
CCATCAATGG 
CAACGGAAGT 
GGATAACTGC 
CCGTTGATAC 
AAAGTGGTGT 
GCAGTGCCGG 



CATCGACAAC 
TAATAATCCT 
TTCGGTGTCA 
CCTGGTGGAT 
GCGGTGGCGT 
CCGTAACGGG 
TTATCTGGAT 
GGTGAATCAC 
GGGGATGCCC 
TCAGTTCCGG 
GTGGCTGACG 
GAGCCTCGGG 
CGGCGGTTAT 
AGTCCGTGCA 
CTCGTCCGGC 
CGGTTCGTTC 
CACGGTTTAC 
CGTTCGCTTC 
GTTGTTACCG 
GGCAAATGGT 
CCGGTTTACG 
GCTTAATCCG 
GGAAGCGAAG 
TACGGAAGTG 
TGACGGCTGT 
GAATGCCGAC 
CGCGGGCCTG 
GAATGCCGCG 
AGATGGTCGC 
CGGCGCGCTC 
CGGGGTTAAT 
AAAACTGGTC 



AGCGGGC AG A 
CTGGGGCTGG 
CATGATGCGA 
TACACGTATT 
TCCACGGGAG 
GACATGAAGA 
GATGTTCTGC 
ACACACAAGT 
TGGTTCGGCG 
AAATGGTCGG 
AGCGCTTATG 
GGTGAGAGTT 
CTGCGAAATG 
GTGACTGCAC 
ACGCTGTGGG 
ACTGCCGGAC 
TGGCGCGTTG 
ACTTACCGCC 
GCTGTGGGGG 
GTGCCGGTCG 
GATTACAACG 
ACGGAGCTTG 
GGTATCATCA 
GCCGGCAAAG 
GGTTTTATCA 
GGCAGCCTGC 
GACGGCACCC 
CTTCATGCGA 
GTCAGTGCCC 
GGTGGAATGT 
CTGGGTAACC 
CTGAAGAACA 



12000 
120G0 
12120 
12180 
12240 
12300 
12360 
12420 
12480 
12540 
12600 
12660 
12720 
12780 
12840 
12900 
12960 
13020 
13080 
13140 
13200 
13260 
13320 
13380 
134 40 
13500 
13560 
13620 
13680 
13740 
13800 
13860 
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GCCTTGCCGG 


CGGCAATACC 


ACCGTAACCG 


GAACGGATGT 


CTCACTTTCA 


GGGGATAACA 


13 920 


AAGCCGGAGG 


AAATCTCAGC 


GTTACCGGGA 


CAACGGGACT 


GACACTGAAT 


CAGCCCCGTC 


13980 


TGGTGACGGA 


TAAAAATCTG 


GTGCTGTCTT 


CATCCGGGCA 


GATTGTACAG 


AACG GTGGTG 


14040 


AACTGACTGC 


CGGACAGAAC 


GCCATGCTCA 


GTGCACAGCA 


CCTGAACCAG 


ACTTCCGGGA 


14100 


CCGTGAATGC 


AGCTGAAAAT 


GTCACCCTTA 


CCACCACCAA 


TGATACCACA 


CTGAAAGGCC 


14160 


GCAGCGTTGC 


CGGGAAAACA 


CTCACTGTCA 


GTTCCGGCAG 


CCTGAACAAC 


GGTGGGACAC 


14220 


TGGTTGCCGG 


GCGCGATGCC 


ACGGTGAAAA 


CCGGGACATT 


CAGTAATACC 


GGTACCGTCC 


14280 


AGGGGAATGG 


CCTGAAAGTT 


ACCGCCACTG 


ACCTGACCAG 


CACCGGCAGT 


ATTAAAAGTG 


14340 


GCAGCACACT 


CGATATCAGC 


GCCCGCAATG 


CCACACTGTC 


CGGTGATGCC 


GGTGCAAAAG 


14400 


ACAGTGCCCG 


CGTTACCGTC 


AGCGGTACAC 


TCGAAAACCG 


CGGCAGACTT 


GTCAGCGATG 


14460 


ACGTGCTGAC 


GCTCAGTGCC 


ACGCAGATAA 


ACAACAGCGG 


TACCCTCTCC 


GGGGCAAAGG 


14520 


AACTTGTGGC 


TTCTGCAGAC 


ACACTGACCA 


CCACAGAAAA 


ATCGGTCACA 


AAGAGTGACG 


14580 


GTAACCTCAT 


GCTGGACAGC 


GCGTCTTCCA 


CACTGGCGGG 


TGAAACCAGT 


GCGGGTGGCA 


14640 


CGGTGTCTGT 


AAAAGGCAAC 


AGTCTGAAGA 


CCACGACCAC 


TGCGCAGACG 


CAGGGCAACA 


14700 


GTGTCAGCGT 


GGATGTGCAG 


AACGCACAGC 


TTGACGGAAC 


ACAGGCTGCC 


AGAGACATCC 


14760 


TTACCCTGAA 


CGCCAGTGAA 


AAGCTCACCC 


ACAGCGGGAA 


AAGCAGTGCC 


CCGTCGCTCA 


14820 


GCCTCAGTGC 


GCCGGAACTG 


ACCAGCAGCG 


GCGTACTTGT 


TGGTTCCGCC 


CTGAATACAC 


14880 


AGTCACAGAC 


CCTGACCAAC 


AGCGGTCTGT 


TGCAGGGGGA 


GGCCTCACTC 


ACCGTTAACA 


14940 


CACAGAGGCT 


TGATAATCAG 


CAGAACGGCA 


CGCTGTACAG 


TGCTGCAGAC 


CTGACGCTGG 


15000 


ATATACCGGA 


CATCCGCAAC 


AGCGGGCTTA 


TCACCGGTGA 


TAATGGTTTA 


ATGTTAAATG 


15060 


CTGTCTCCCT 


CAGCAATCCG 


GGAAAAATCA 


TCGCTGACAC 


GCTGAGCGTC 


AGGGCGACCA 


15120 


CGCTGGATGG 


TGACGGCCTG 


TTGCAGGGCG 


CCGGTGCACT 


GGGGCTTGCT 


GGCGACACCC 


15180 


TCTCACAGGG 


TAGTCACGGA 


CGCTGGCTGA 


CGGCGGACGA 


CCTCTCCCTC 


CGGGGCAAAA 


15240 


CACTGAATAC 


CGCAGGACCA 


CGCAGGGACA 


GAATATCACC 


GTGCAGGGGG 


ACAGATGGGC 


15300 


GAACAGTGGT 


TCCGTGCTGG 


CAACCGGTAA 


CCTTACTGCT 


TCGGCAACCG 


GTCAGTTGAC 


15360 


CAGTACCGGC 


GATATCATGA 


GCCAGGGTGA 


CACCACGCTG 


AAAGCAGCCA 


GCACGGACAA 


15420 


CCGGGGCAGT 


CTGCTTTCGG 


CCGGCACGCT 


CTCCCTTGAT 


GGAAACTCAC 


TGGATAACAG 


15480 


CGGCACTGTC 


CAGGGTGACC 


ATGTCACGAT 


TCGCCAGAAC 


AGTGTCACCA 


ACAGTGGCAC 


15540 


GCTCACCGGG 


ATCGCCGCGC 


TGACGCTTGC 


CGCCCGTATG 


GTATCCCCTC 


AACCTGCGCT 


15600 


GATGAATAAC 


GGAGGTTCAT 


TGCTGACCAG 


CGGCGATCTG 


ACAATCAGCG 


GAGGCAGTCT 


15660 


GGTAAACAGC 


GGGGCGATCC 


AGGGGGCTGA 


CAGCCTGACT 


GCACGTCTGA 


GGGGTGAGCT 


15720 


CGTCAGCACA GCGGGCAGCA AAGTCACCTC 


GAACGGTGAA 


ATGGCGCTCA 


GTGCACTGAA 


15780 
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TTTAAGCAAC 
CAGTGCGGGT 
CAATCAGGCG 
CACAAACGAC 
CGGCGGGCAT 
TTCCGGTGGT 
CCAGGGCACG 
CGACGGCAAA 
CGGCAGCGGA 
CGGACGCGTA 
TACGCTTCAG 
GCTGGGAACC 
GCTGTACAGT 
GGTGGCCACC 
GGCCGCAGGG 
GCAGGGTGAT 
TGCCGGTAAA 
ACTTCAGGCC 
TACCGGCACG 
GATTTATGCG 
TATCCTGGCC 
GATTATCAAT 
GCATCTTCTG 
ATCCATTCAG 
CAGCTATGGC 
CGGGGCATGC 
CACACAGCGC 
AGGCCGCATT 
GTCATTTATC 
CTGGCAGACG 
CGGTAGCTAT 
CAATACCATC 



AGCGGACAAT 
GACATCACCG 
AACGGAAAAC 
GGGCAATTAC 
CTGCAGGGCG 
GTTCTGATGA 
ATACAGGGTG 
ATCCTCTCCG 
CTGGTACAGG 
CTTGCCACCG 
GGTGCGGACC 
TCCGGGCTTG 
GCAGGCAACG 
GGTGATGTCA 
AAAACCCTTT 
GGCATGGTGC 
GGCAACAGTG 
GGTGGCGATG 
GCAGGCAGTC 
GGGAATAACC 
GGCAACAGTC 
ACTTCCGGGA 
AACCAGCGGG 
GGAATGGGAA 
TATTTCACCC 
AATATCACAA 
TTTCTCAGCA 
GCGTCAGGGC 
CTGGCGAATG 
GGGACAGAGA 
GCAACAGGCT 
AGATTTTCAC 



G G.AT T G C AAA 
GTGTGGATAC 
TGCTCAGTGC 
AGGGAAATGC 
AAACGCTGAC 
GCCGGAATGC 
GTGGCGGGGT 
GCAGTAACCT 
CTGCCACCCT 
GCAGTGCCGA 
TGCTGGTGAA 
GCGTCAAGGG 
TGGTGCTTGA 
CACTGAAACT 
CCGTCACGTC 
TCGGTGCCGG 
TTTTCAGCGC 
TGAGTCTGAA 
TGACAATGAA 
TGAAGGTGTT 
TGTGGGTACA 
AT AT T GAG AC 
AGGGATTTTC 
ATGCTCTGGT 
GTGAAGTTGA 
TGGATACGCT 
GCCAGAACAT 
GTAATCTTTC 
GGGATATCGC 
ATGAATATCT 
CTCTGGATAA 
TGGATGGCCG 
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AAATCTGACC 
TCTCACGCTC 
AGGTGTGCTG 
CACCACCATC 
GCTGGCCGCC 
ACTGAATGTC 
TTCCCTGAAC 
CACGCTGACG 
GCTGCTGGAT 
CGTTAAAGGA 
TTACCACACA 
CAGTTCACTG 
CGCTCAGGAC 
GATTGCTGCC 
GCAAAATGCC 
AGAGGCATTC 
ACAGCGTCTT 
CAGCCGGAGT 
TGTGGCCGGT 
TACAGACCGT 
GAAGGATGCT 
GCATCAGGGC 
TGCCACAACA 
TGATATTCCC 
AAATCAGCAC 
TTATTATTAC 
CACAACAGTA 
TGCTGAGGCT 
ACTCTCGGGC 
GGTATACCGC 
ACTGCCCCTG 
GGAAAAAGAT 



CTGAAGGCGA 
ACGGTGAATC 
ACGGTGAAGG 
ACGGCAGGAC 
TCCGGTGGCG 
AGTACTGCGA 
GCCACTGACC 
GCGCAGGTGC 
GTGGTGAATA 
ACCACGCTGA 
TTCAGCAACA 
CTGCAAAATG 
TTCAGTGGTC 
CTCACGAATT 
ATCACCAACG 
ACCAACAATG 
TTCCTTAACG 
GATATCACCA 
ACCCTGCTGA 
CTGCATAACC 
TCCGGCGGTG 
GATATTGTTG 
ACAACCCGGA 
CTTTCCCTTC 
GGTACGCCCT 
GCTCCGTTTG 
ACCGGTGCTG 
GAACGACTGG 
AGAGAGTTAA 
TACGACCCGA 
CTGTCACCGG 
TACACGCCCG 



ACTCACTGAC 
AGACGCTGAA 
CAGACAGTGT 
AACTCACAAA 
TGAACAACCG 
CCCTGAGTAA 
GTCTGCAGAA 
TGGCGAACAC 
CTGTCAACGG 
ATAATACCGG 
GCGGTACCCT 
GTACAGGGCG 
AGGGGCAGGT 
ACGGTACCCT 
GCGGTGTCAT 
GAACGCTGAC 
CACCGGGTTC 
TCAGTGGTTT 
ACAGTGCGCT 
AGCATGGTGA 
CAAACACAGA 
TAAGAACCGG 
CTAACCCCTC 
TTCCTGACGG 
GCAACGGGCA 
CTGACAGTGC 
ATAATCCGGC 
AAAACCGGGC 
GCAATCAGAG 
AAACGTTTTA 
AATTTGAAAA 
GTAAGACGTA 



15840 
15900 
15960 
16020 
16080 
16140 
16200 
16260 
16320 
16380 
16440 
16500 
16560 
16620 
16680 
16740 
16800 
16860 
16920 
16980 
17040 
17100 
17160 
17220 
17280 
17340 
17400 
17460 
17520 
17580 
17640 
17700 
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TTATTCCGTT ATTCAGGCGG GCGGGGATCT TAAGACCCGT TTTACCAGCA GTATCAATAA 177 60 

CGGAACAACC ACTGCACATG CAGGTAGTGT CAGTGCGGTG GTCTCTGCAC CTGTACTGAA 17820 

TACGTTAAGT CAG CAGACCG GCGGAGACAG TCTGACACAG ACAGCGCTGC AGCAGTATGA 1788 0 

GCCGGTGGTG GTTGGCTCTC CGCAATGGCA CGATGAAGTG GCAG GTGCCC TGAAAAATAT 17 940 

TGCCGGAGGT TCGCCACTGA CCGGTCAGAC CGGTATCAGT GATGACTGGG CACTGCCTTC 18000 

CGGCAACAAT GGATACCTGG TTCCGTCCAG GGACCCGGAC AGTCCGTATC TGATTACGGT 18 060 

GAACCCGAAA CTGGATGGTC TCGGACAGGT GGACAGCCAT TTGTTTGCCG GACTGTATGA 1812 0 

GCTTCTTGGA GCGAAACCGG GTCAGGCGCC ACGTGAAACG GCTCCGTCGT ATACCGATGA 18180 

AAAACAGTTT CTGGGCTCAT CGTATTTTCT TGACCGCCTC GGGCTGAAAC CGGAAAAAGA 18 240 

TTATCGTTTG CTGGGGGATG CGGTCTTTGA TACCCGGTAT GTCAGTAACG CGGTGCTGAG 18 300 

CCGGACGGGT TCACGTTATC TCAACGGACT GGGTTCAGAG ACGGAACAGA TGCGGTATCT 18 360 

GATGGATAAC GCGGCCAGAC AACAGAAAGG ACTGGGATTA GAGTTTGGTG TGGCGCTGAC 184 20 

AGCTGAACAG ATTGCTCAGC TTGACGGGAG CATGCTGTGG TGGGAGTCAG TCACCATCAA 184 80 

CGGACAGACA GTCATGGTCC CGAAACTGTA TCTGTCGCCG GAAGATATCA CCCTGCATAA 18 540 

CGGCAGCGTT ATCAGCGGGA ACAACGTGCA GCTTGCGGAC GGCAATATCA CCAACAGCGG 18 600 

CGGCAGCATC AACGCACAGA ACGACCTTTC GCTCGACAGT ACGGGCTATA TCGACAACCT 18 660 

GAATGCAGGG CTGATAAGCG CGGGCGGTAG CCTGGACCTG AGCGCCATCG GGGATATCAG 18 720 

CAATATCAGG TCAGTCATCA GCGGTAAAAC CGTACAACTG GAAAGCGTGA GTGGGAACAT 187 80 

CAGCAATATC AGCCGGCGTC AGCAATGGAA TGCGGGCAGT GACAGCCGAT ATGGTGGTGT 18 840 

GCATCTCAGC GGTACGGACA CCGGTCCGGT TGCGACCATT AAAGGCACTG ATTCACTTTC 18 900 

ACTGGATGCA GGGAAAAACA TTGATATTAC CGGGGCAACG GTCTCGTGCG GTGGAGACCT 18 960 

TGGAATGTCT GCGGGTAATG ACATCAACAT TGCCGTAAAC CTGATAAGCG GGAGCAAAAG 19020 

TCAGTCCGGT TTCTGGCACA CTGATGACAA CAGTTCATCA TCCACCACCT CACAGGGCAG 19080 

CAGCATCAGC GCCGGCGGTA ACCTGGCGAT GGCTGCAGGC CATAATCTGG ATGTCACAGC 1914 0 

ATCCTCTGTT TCTGCCGGGC ACAGCGCCCT GCTTTCTGCA GGTAACGACC TGAGTCTGAA 19200 

TGCAGTCAGG GAAAGCAAAA ACAGTCGCAA CGGCAGGTCA GAAAGTCATG AAAGCCACGC 192 60 

AGCTGTGTCC ACGGTGACGG CGGGCGATAA CCTCCTCCTT GTTGCCGGTC GTGATATTGC 19320 

CAGTCAGGCT GCCGGTATGG CTGCGGAAAA TAACGTGGTC ATCCGGGGCG GACGTGATGT 19380 

GAACCTGGTG GCAGAGTCTG CCGGCGCAGG CGACAGCTAT ACGTCGAAGA AAAAGAAAGA 19440 

GATTAACGAG ACAGTCCGTC AGCAGGGAAC GGAAATCGCC AGCGGTGGTG ACACCACCGT 19500 

CACCGCAGGA CGGGATATCA CCGCTGTTGC GTCATCCGTT ACCGCAACCG GCAATATCAG 19560 

CGTGAATGCC GGTCGTGATG TTGCCCTGAC CACGGCGACA GAAAGTGACT ATCACTATCT 19 620 
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GGAAACGAAG 


AAAAAAAGCG 


GAG 3TTTTCT 


CAGTAAGAAA 


ACCACCCACA 


CCATCAGTGA 


1 9680 


GGACAGTGCC 


TCCCGTGAAG 


CAGGTTCCCT 


GCT 2TCGGGG 


AACCGCGTGA 


CCGTTAACGC 


1 9740 


CGGTGATAAN 


CTGACGGTAG 


AGGGTTCGGA 


TGTGGTGGCT 


GACCGGGATG 


TGTCACTGGC 


19800 


GGCGGGTAAC 


CATGTTGATG 


TTCTTGCTGC 


CACCAGTACA 


GATACGTCCT 


GGCGCTTTAA 


19860 


GGAAACGAAG 


AAATCCGGTC 


TGATGGGTAC 


CGGCGGTATT 


GGTTTCACCA 


TTGGCAGCAG 


19920 


TAAGACAACG 


CACGACCGCC 


GCGAGGCSGG 


GACAACGCAG 


AGTCAGAGTG 


CCAGTACCAT 


19980 


CGGCTCCACT 


GCCGGTAATG 


TCAGTATTAC 


CGCGGGCAAA 


CAGGCTCATA 


TCAGCGGTTC 


20040 


GGATGTGATT 


GCGAACCGGG 


ATATCAGCAT 


TACCGGTGAC 


AGTGTGGTGG 


TTGACCCGGG 


20100 


GCATGATCGT 


CGTACTGTGG 


ACGAAAAATT 


TGAGCAGAAG 


AAAAGCGGGC 


TGACGGTTGC 


20160 


CCTTTCCGGC 


ACGNTGGGCA 


GTGCCATCAA 


TAATGCGGTC 


ACCAGTGCAC 


AGGAGACGAA 


20220 


GGAGAGCAGT 


GACAGCCGTC 


TGAAAGCCCT 


GCAGGCCACA 


AAGACAGCGC 


TGTCTGGTGT 


20280 


GCAGGCCGGA 


CAGGCTGCGG 


CAATGGCCAC 


CGCAACCGGT 


GACCCGAATG 


CGACGGGAGT 


20340 


CAGCCTGTCG 


CTTACCACCC 


AGAAATCGAA 


ATCACAACAA 


CATTCTGAAA 


GTGACACAGT 


20400 


ATCCGGCAGT 


ACGCTGAATG 


CCGGGAATAA 


TCTGTCTGTT 


GTCGCAACCG 


GCAAAAACAG 


20460 


GGGAGATAAC 


CGCGGAGATA 


TTGTGATTGC 


AGGAAGCCAG 


CTTAAGGCCG 


GTGGTAACAC 


20520 


AAGCCTGGAT 


GCCGCGAATG 


ATGTTCTGTT 


GAGTGGCGCT 


GCAAACACAC 


AAAAAACAAC 


20580 


GGGCAGGAAC 


AGCAGCAGTG 


GCGGTGGCGT 


GGGTGTCAGT 


ATCGGTGCCG 


GTGGTAACGG 


20640 


TGCCGGTATC 


AGCGTCTTTG 


CCAGCGTTAA 


TGCGGCAAAA 


GGCAGCGAGA 


AAGGTAACGG 


20700 


TACTGAGTGG 


ACTGAAACCA 


CAACAGACAG 


CGGTAAAACC 


GTCACCATCA 


ACAGTGGTCG 


20760 


GGATACGGTA 


CTGAACGGTG 


CTCAGGTCAA 


CGGCAACAGG 


ATTATCGCCG 


ATGTGGGCCA 


20820 


CGACCTGCTG 


ATAAGCAGCC 


AGCAGGACAC 


CAGTAAGTAC 


GACAGTAAAC 


AGACCAGCGT 


20880 


GGCTGCCGGC 


GGCAGTTTTA 


CCTTTGGCTC 


CATGACCGGC 


TCAGGTTACA 


TCGCTGCCTC 


20940 


CCGGGATAAG 


ATGAAGAGCC 


GCTTTGACTC 


CGTTGCTGAA 


CAAACCGGGA 


TGTTTTCCGG 


21000 


AGATGGCGGC 


TTCGATATCA 


CGGTCGGCAA 


CCACACCCAG 


CTCGATGGTG 


CGGTTATCGC 


21060 


TTCCACGGCG 


ACGGCAGATA 


AAAACAGCCT 


CGATACCGGG 


ACGCTCGGCT 


TCAGCGATAT 


21120 


TCACAACGAA 


GCGGATTATA 


AAGTCAGTCA 


CAGTGGAATC 


AGTCTGAGCG 


GTGGTGGCAG 


21180 


CTTCGGGGAT 


AAATTTCAGG 


GTAACATGCC 


GGGTGGCATG 


ATATCCGCCG 


GAGGTCACAG 


21240 


CGGACATGCG 


GAAGGAACGA 


CTCAGGCCGC 


AGTGGCAGAT 


GGCACAATCA 


CCATCCGGGA 


21300 


CAGGGACAAT 


CAGAAGCAGA 


ATCTGGCGAA 


CCTGAGCCGT 


GACCCTGCGC 


ACGCTAATGA 


21360 


CAGTATCAGC 


CCGATATTTG 


ACAAGGAGAA 


AGAGCAGAGG 


CGTCTGCAGA 


CAGTGGGGCT 


21420 


TATCAGTGAC 


ATTGGCAGTC 


AGGTGGCGGA 


TATCGCGCGG 


ACGCAGGGGG 


AACTGAATGC 


21480 


GTTGAAGCTG 


CGCAGGATAA 


ATATGGGCCT 


GTTCCGGCGG 


ATGCGACGGA 


AGAACAGCGG 


21540 
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CAGGCATATC 


TGGCAAAACT 


GCGTGATACG 


CCGGAATACA 


AAAAGGAACA 


GGAAAAGTAT 


21600 


GGTACCGGCA 


GCGATATGCA 


GCGCGGTATC 


GAGGCTGCAA 


CGGCTGCACT 


TCAGGGCCTG 


216(50 


GTGGGCGGCA 


ATATGGCAGG 


CGCGCTGGCA 


GGTGCTTCAG 


CGCCGGAGCT 


GGCGAACATC 


21720 


ATCGGTCATC 


ACGCGGGTAT 


TGATGACAAT 


ACAGCGGCAA 


AAGCCATTGC 


CCATGGCATT 


21780 


CTCGGTGGTG 


TGACAGCAGC 


CCTTCAGGGC 


AACAGTGCGG 


CAGGAGGCGG 


AATTGGTGCG 


218 4 0 


GGTACTGGTG 


AAGTGATCGC 


GTCAGCCATT 


GCGAAAAGCG 


TCTACCCGGG 


CGTAGATCCG 


21900 


TCGAAACTGA 


CAGAAGATCA 


GAAGCAAACT 


GTAAGCACGC 


TGGCAACGCT 


GTCAGCGGGT 


21960 


ATGGCCGGCG 


GCATTGCCAG 


TGGCGATGTG 


GCTGGCGCGG 


CTGCTGGAGC 


TGGTGCCGGG 


22020 


AAGAACGTTG 


TTGAGAATAA 


TGCGCTGAGT 


CTGGTTGCCA 


GAGGCTGTGC 


GGTCGCAGCA 


22080 


CCTTGCAGGA 


CTAAAGTTGC 


AGAGCAGTTG 


CTAGAAATCG 


GGGCGAAAGC 


GGGCATGGCC 


22140 


GGGCTTGCCG 


GGGCGGCAGT 


CAAGGATATG 


GCCGACAGGA 


TGACCTCCGA 


TGAACTGGAG 


22200 


CATCTGATTA 


CCCTGCAA.AT 


GATGGGTAAT 


GATGAGATCA 


CTACTAAGTA 


TCTCAGTTCG 


22260 


TTGCATGATA 


AGTACGGTTC 


CGGGGCTGCC 


TCGAATCCGA 


ATATCGGTAA 


AGATCTGACC 


22320 


GATGGGGAAA 


AAGTAGAACT 


GGGC bb I 1 LL 








2 238 0 


GAAAATGATC 


CTAAGCAGCA 


AAATGAAAAA 


ACTGTAGATA 


AGCTTAATCA 


GAAGCAAGAA 


22440 


AGTGCGATTA 


AGAAGATCGA 


TAACACTATA 


AAAAATGCTC 


TGAAAGATCA 


TGATATTATT 


22500 


GGAACTCTCA 


AGGATATGGA 


TGGTAAGCCA 


GTTCCTAAAG 


AGAATGGAGG 


ATATTGGGAT 


22560 


CATATGCAGG 


AAATGCAAAA 


TACGCTCAGA 


GGATTAAGAA 


ATCATGCGGA 


TACGTTGAAA 


22620 


AACGTCAACA 


ATCCTGAAGC 


TCAGGCTGCG 


TATGGCAGAG 


GAACAGATGC 


T 


22671 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2385 base pairs 

(B) TYPE: nuclexc acid 

(C) STRANDEDNESS : double 
{ D ) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GGGCGACACG GAAATGTTGA ATACTCATAC TCTTCCTTTT TCAATATTAT TGAAGCATTT 60 

ATCAGGGTTA TTGTCTCATG AGCGGATACA TATTTGAATG TATTTAGGCA ACTGAAACCC 120 

GCTGACGGAT NANGTGTACA GTGGCATCAG TGGACGGMTT ACAGCATAAG TGCTTAAGGC 180 

GCGTGACCAT ACAGMTACGG TCGCTGCAGA GAACAGGGAG AATATCATCC GGAACACGGT 24 0 

GGCCATAAAC CGTAACACCA GGGGGCTGCT TTCCCCGGGA GAGGTGCTGG AGATGCATGC 300 

GGACGTCTGA ACAGTCAGCA GGGCTGATTA ATGAGAATCA CGAGGAAATG AAGCGGGAGC 3 60 

CGTACAGTGA GGATAAATTT AACGCCATAG CGGCTGTGGG CGGGTATAGT GCCAAGCAGA 4 20 
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CTGCTTAAAG GCAGGTACTA CTTTCAGTGG CGGCTATGTT TCCTGGAATG TGGGTGTCAA 4 80 

CTGGTAGTTG TGAACCCGGG CCTGAGTCAC CGGGGAGGCA GTTTTCGGTA TGAAGTAATG 54 0 

ATTCGCTGCC TGTTTTTCTC CCCGATGGCA TAACTGACTG TTCCCGGGTA TTCCTGAAGA 600 

TCTGAGAGGA AGAGTGTATA TGCTGAACTA TCGCATAAGG TCAGTGCAGC TATTTATTGT 660 

AAACGGTCGG GCTGACAGGG GGCAGGTGCG TCTGGAATGC GACGATGAAG CCGTTTTTGA 72 0 

ATGTTATCTT CTTGCTGAAG GGGAAGGGGA ACTGAAAGAA CTGAGCCTGT CAGAGCTGGA 78 0 

AGAGCGGGCG CTGATGTATG CGGCAGACAG TTTCCGTTAT GAATGATAAG TCAGTTATAC 84 0 

CGGTAATGGT AAACGGAGCC GGTATCCGGG ATACAAGGGG C AG AG AG TAT GCTGATTATT 90 0 

ATTATGACCC GGGACAGATA TCTGGAATAT GGCCTGATGC GTATACTGAG CGGATATCAG 96 0 

GTCACGACAG GCAGAGAGCT GTTTAATGCC GGAAAGCAAC GTCAGTCACT TCCCGAAGAC 102 0 

AGTTATGTGA TTCTCTGTGA CCGTAATCTG GAAAGGCTTA CATACTCTAT GTTCTGTGGG 108 0 

CGTCGGTTTC TTGTCATTCC TGTTTCCTCT GTGAGATGCC TGACAGATAT CAGGCAAACC 114 0 

ATCCGCCGTG GAGCGTGGCT GTTCGGACAT ACGGCAAGGC CACTGACCCG GACAGAGATG 12 00 

GTGGTGGTCT TCGGGGTTGT TTTCCATGAC TACGGGTTTA CCTTTCTGGC AGACCGGCTG 12 60 

GGGATAACCA TGAAGACGGT ATGTGCGCAT CTTTACAATG CGATGGAGAA AAATGGTATG 1320 

CGCGGCGTCA GTATTAAATA TCTCTGCAAC ACCATAGACC GGTAAAAAGA TGGTTTTCTG 138 0 

ATAAAGGCTG TTGCGACGGG GATTTCTGTG CATGCTGTGT CACGGGCATC CCAGCTCTCC 14 4 0 

GGATAATTAA TGTTATGTAG TCAGGCGTGA TAAATTTCAT ATGGAACAGG TATGCGTTTT 1500 

ATTTGTGATA ACAGTTAATG AGGTGTTTCC ATACACACTG AAGTTACCTG TAATATTAGC 15 60 

GGGGGATTTG AATGATGTTG CGTGTCTGCG ACCACTCGTT TATTCATGCA AATAAGTGGA 162 0 

CTGCTGGATC CACGGTAAGA GTACAGCGAG GGCCGTATTG AGGGGGATGT GTTATTCAGC 168 0 

GGGCAGTGGT ATGCGCCACG GAAGCAGTTC GCTGACACGG TTGACCGGCC AGTCAGCTAT 174 0 

GACGCCAAAC ACATGGCGAA GGTAGTTTTC TGGATCCTCG TCGTTCAGTT TGCACGTCCC 18 00 

GATCAGGCTG TACAGTAGCA CTCCCCGCTC ACCACCATGC TCAGAGCTGC GTATTACCGT 18 60 

GAAGGAGATC GGTGAGTAAC CCTCTGTGTC GGCACATTAT AGCCGTCACA TCGGATAACT 192 0 

GTTATCCTTC TGTTCTGATG TATTCTGGGA GGTGATGTTT CACTCCTGAT AAGAGCATTA 198 0 

CTAATTACAG CTGCTTTTCG GATAACATTG GGGCAGTTTT CTTTAATTCT GAAGTCTGAA 204 0 

AGAGATATCA GTAATTGTAT TGCTTTTAAA GATTGTCAGT ATTTATTTGT CCAAATCGTT 2100 

CACGTTTCTC ATAATCTTCC CGACAGTCAG GATCACAAAA CAATCCAGTC TTAACAGGTT 2160 

CTCCGCAGTT ATAGCAGAAT CCTGTTTCAG GGAGTCTATT CCGGATACGA TTTTTTAGTC 2220 

TGATGCTCAT GCTGAATTGT TCATTTTCAT AAGCAATATC TGGACTATCT GCCATAAACG 228 0 

ATCCTCTGAG GAGACCACAT CTTTATAACC CACCACCGAA ATATTACAAA GTAATACTCA 234 0 
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TTGTATAATC TTTAACCRGG GGCAGGATAA TTGTATCCTG CCCCT 2 38 5 

(2) INFORMATION FOR SEQ ID MO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 7 4*5 ba^e pairs 
(13) TYPE: nucleic acid 

(C) 5TRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CTTTCAGACC AGCGTTTCCT GTCAGGAGAT GAGGAAGAAA CATCAAAGTA TAAAGGCGGC 60 

GATGACCATG ATACGGTATT CAGTGGCGGT ATTGCGGCCG GTTATGATTT TTATCCGCAG 120 

TTCAGTATTC CGGTTCGTAC AGAACTGGAG TTTTACGCTC GTGGAAAAGC TGATTCGAAG 180 

TATAACGTAG ATAAAGACAG CTGGTCA3GT GGTTACTGGC GTGATGACCT GAAGAATGAG 240 

GTGTCAGTCA ACACACTAAT GCTGAATGCG TACTATGACT TCCGGAATGA CAGCGCATTC 300 

ACACCATGGG TATCCGCAGG ATTGGCTACG CAGAATTCAC CAGAAAACAA CCGGTATCAG 3 60 

TACCTGGGAT TATGAGTACG GAAGCAGTGG TCGCGAATCG TTGTCACGTT CAGGCTCTGC 4 20 

TGACAACTTC GCATGGAGCC TTGGCGCGGG TGTCCGCTAT GACGTAACCC CGGATATCGC 4 80 

TCTGGACCTC AGCTATCGCT ATCTTGATGC AGGTGACAGC AGTGTGAGTT ACAAGGACGA 54 0 

GTGGGGCGAT AAATATAAGT CAGAAGTTGA TGTTAAAAGT CATGACATCA TGCTTGGTAT 600 

GACTTATAAC TTCTGACGAC ACTGCTCCTG AACGATAATT GCGTATATTC TGTAATTAAG 660 
ATAATTGCAT ATCKTCTGCA ATTAAF.CAGA AATACCCTGC AGTCTATTAC TGCAGGGNTG 
TCTTTTATCT GTTTTACAGA NAATTT 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 411 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
TCTGTTTGTC GTTTTTTCCC CGTTGTAGCG GYTCTGCTCC TGGCTTCCCT GATAGTCAGC 
CCGCAGGCGC CAGGGCCCCA GATTCCCCCC CACAGTCCCG TTATAACTGA ACTGATGAGA 
GTCTCCTCCC TGATAATTAC GGGAAACCGT CCCGTTGAGG TTATAATCCA GCATCAGTCC 
GGGAATGCCG TCGTCCCAGC GTGAGGGAGG CAGCCAGGTG GCATCAGAAT ACTCAAGCCC 
AGCTGCGGCA TATTGATGCG TAATACGCCC GCTCCGGTAT CAGGACGAAT ATCCACTCCC 
GGCAACCCAT GAAAATCCGC ACACTGACCA TCATGCCAGT AAAC AAC T T T ATCCAGAGAT 
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TCTGCTGTTA ACCCCATCAG TCTGACCATA TCTGATGTCA GACAGGCCTG C 411 
(2) IN FORMAT I On FOR SEQ ID NO: 18: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 977 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : rouble 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

TATTATCGCG CGCGCGCTGC ACAGGGGTTA TCTACATCTG CTGCTGCTGC CGGTTTAATT 60 

GCTTCTGTAG TGACATTAGC AATTAGTCCC CTCTCATTCC TGTCCATTGC CGATAAGTTT 120 

AAACGTGCAA ATAAAATAGA GGAGTATTCA CAACGATTCA AAAAACTTGG ATACGATGGT 180 

GACAGTTTAC TTGCTGCTTT CCACAAAGAA ACAGGAGCTA TTGATGCATC ATTAACAACG 240 

ATAAGCACTG TACTGGCTTC AGTATCTTCA GGTATTAGTG CTGCKGCAAC GACATCTCTT 300 

GTTGGTGCAC CGGTAAGCGC ACTGGTAGGT GCTGTTACGG GGATAATTTC AGGTATCCTT 3 60 

GAGGCTTCAA AGCAGGCAAT GTTTGAACAT GTTGCCAGTA AAATGGCTGA TGTTATTGCT 4 20 

GAATGGGAGA AAAAACACGG T AAAAAT T AC TTTGAAAATG GATATGATGC CCGCCATGCT 4 80 

GCATTTTTAG AAGATAACTT TAAAATATTA TCTCAGTATA ATAAAGAGTA TTCTGTTGAA 54 0 

AGATCAGTCC TCATTACTCA ACAACATTGG GATATGCTGA TAGGTGAGTT AGCTAGTGTC 600 

ACCAGAAATG GAGACAAGAC ACTCAGTGGT AAAAGTTATA TTGACTATTA TGAAGAGGGA 660 

AAGCGGCTGG AAAGAAGGCC AAAAGAGTTC CAGCAACAAA TCTTTGATCC ATTAAAAGGA 72 0 

AAT AT T G AC C TTTCTGACAG CAAATCTTCT ACGTTATTGA AATTTGTTAC GCCATTGTTA 78 0 

ACTCCCGGTG AGGAAATTCG TGAAAGGAGG CAGTCCGGAA AATATGAATA TATTACCGAG 84 0 

TTATTAGTCA AGGGTGTTGA TAAATGGACG GTGAAGGGGG TTCAGGACAA GGGGTCTGTA 90 0 

TATGATTACT CTAACCTGAT TCAGCATGCA TCAGTCGGTA ATAACCAGTA TCGGGNAATT 960 

CGTATTGAGT CACACCT 97 7 



(2) INFORMATION FOR SEQ I D NO : 19: 

<i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 400 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
TTTCTTAAGT CCGGCATTGC CACGCGTAAC CCCCACTTCA ACCGCATGAT TGAGCAGATC 60 
GAAAAAGTGG CGATCAAATC CCGCGCGCCG ATTCTGCTTA ACGGTCCAAC CGGCGCGGGC 120 
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AAGTCATTTC TGGCGCGACG CATCTTAGAG TTAAAACAGG CGCGGCATCA GTTTAGCGGC 180 

GCKTTTGTGG AAGTGAACTG CGCCACCCTG CGCGGCGATA CCGCCATGTC GACGCTGTTT 24 0 

GGTCATGTAA AAGGGGCGTT TACCGGGGCG CGGGAATCTC GTGAAGGTTT ATTACGCAGC 300 

GCCAACGGGG AAATGTTGTT TCTTGATGAG ATTGGCGAAC TGGGGGCGAC GAACAGGCAA 3 60 

TGCTGCTGAA ACCCATTGAA GRGGAAAACC TTTTACCCGT 4 00 



(2) INFORMATION FOR SEQ I D NO : 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12368 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



GTATGCGTTT 


TCATTAAGAT 


ATTCTCTGCT 


GTAGAGAAAC 


TTATAGCAAT 


ATAATCTGAT 


60 


AATATCTTTT 


ATGTAAAATT 


TAAATAGTTC 


ACCTGTGACA 


GATATATGTT 


TTCTGCTCAG 


120 


TAACTCCTGT 


GTATTAAGCC 


ATTCCCGTGA 


CCGAAGCACA 


CCCTTGTGAA 


AACTTTTTCT 


180 


TACTTGCTTT 


GAGGCACGGC 


ATTGATGTAA 


TATTTTTGCG 


TCCTCAATAA 


TTCTCTTTCC 


240 


/~* *"p rp r V *T* T\ r T 1 r T % T* 

L b I I I I A I 1 1 


111 bLAbLA 1 


\_ 1 *w 1 1 1 




X ^^U\JU X V. /i 


VJ^lV— X X X X X V—' 


300 


ATATTTACTG 


ATTATACGAC 


AAATATTCCT 


GACCCGACGA 


TTCTCTTTAT 


TTCGCTTCCA 


360 


TAGCTTATAA 


TGATCATCGC 


ATAACCTTAA 


GGCATTTGCC 


TCATCAAATT 


CTGAAACAGG 


420 


ATTACTGCAT 


TTTTTATTCC 


GACAAATACC 


TTTGTTTTTA 


GCCATACTCT 


TCTTCCCGTC 


480 


AATGGAAAAA 


TTTTCACACC 


CATATTACCT 


GAATGATAAA 


CCGGATTAGT 


GTGATCCGGT 


540 


TCAGT GAAAT 


CAACAGGATA 


CCGGTATGCC 


ATTCAGCAAT 


TCTTCCCTCT 


CCGCGCAAGT 


600 


GAAATCATAT 


CTGACGTTTC 


TTCCTGAAGA 


AATACGCCAG 


AAAATCCTTG 


AACATCTCCA 


660 


CGGTGTTATT 


CATTACGAGC 


CCGTGATTGG 


CATTATGGGT 


AAATCCGGCA 


CCGGCAAGAG 


720 


CAGCCTGTGT 


AATGCCATTT 


TTCAGTCCCG 


TATCTGCGCC 


ACGCATCCCC 


TGAACGGCTG 


780 


CACCCGCCAG 


GCTCATCGTC 


TTACCCTGCA 


GCTCGGTGAA 


CGCAGAATGA 


CGCTGGTCGA 


8 4 0 


TCTGCCCGGC 


ATTGGTGAAA 


CACCGCAGCA 


TGATCAGGAA 


TACCGAGCGC 


TTTATCGTCA 


900 


GTTACTGCCG 


GAACTGGATC 


TGATTATCTG 


GATCCTGCGG 


AGTGATGAAC 


GTGCGTATGC 


960 


TGCCGATATT 


GCCATGCATC 


AGTTTTTACT 


GAATGAGGGC 


GCAGATCCCT 


CGCGCTTTCT 


1020 


GTTTGTTCTC 


AGCCATGCCG 


ATCGCATGTT 


TCCTGCTGAA 


GAATGGAATG 


CCACAGAAAA 


1080 


ATGCCCGTCC 


CGTCACCAGG 


AACTCTCACT 


GGCGACAGTA 


ATAGCCCGGG 


TGGCCACCCT 


1140 


GTTCCCTTCA 


TCATTTCCGG 


TACTCCCTGT 


AGCCGCACCT 


GCAGGCTGGA 


ACCTTCCAGC 


1200 


GCTGGTGTCA 


CTGATGATCC 


ACGCGCTGCC 


ACCACAGGCA 


ACCAGCGCAG 


TTTATTCACA 


1260 
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TATCAGGGGG GAAAACCGCT CTGAACAGGC CCGGAAACAC GCACAACAGA CTTTTGGTGA 13 JO 

TGCCATCGGG AAAAGTTTTG ACGACGCCGT TGCCCGGTTC AGTTTTCCGG CCTGGATGTT 1380 

ACAGCTTCTG CGTAAAGCCC GGGACCGCAT TATCCACCTG CTGATCACAC TGTGGGAGCG 1440 

TCTGTTCTGA CACACTCACG CCGACAGATG TGTCGCTGGA TTAACGAGCA TTCTTCTTTT 1500 

TATGAAATCA TGCTTAAAAA TCAGATAATT ARAAGAATAT TTTTTCTGCT GCATTTTATT 15 60 

CCTGATTATC CGGATGCGAC ACATCCTTTC AACATCATGA TGCATAATAA CATCATGAAA 1620 

TAAAAGATGT TTTCTTACGG AGTGCACATC TATGTCTGAT AATCGTTCCG GGCATGATCG 1680 

CCTGGGGGTT CGCTTATCAC TCATTATCAG CCGACTGATG GCCGGAGAAT CTCTGTCACT 174 0 

AAAAACACTG TCAGATGAAT TTGGCGTTAC AGAACGTACT TTACAGCGCG ATTTTCATCA 1800 

GCGTCTGGTT CACCTAGATT TAGAGTACAG AAATGGCAGG TACAGCCTCA GACGACAGAG 18 60 

CAGCCCAGGT GCGATCCCTG AAATGCTTTC TTTTATACAG AATACCGGGA TCGCACGGAT 192 0 

ACTTCCGCTC CGGAACGGAC GACTGATAAC CTGTCTTACC GACAACCAGG AGCCCTCTCC 198 0 

CTGCCTTATC TGGCTACCGG CGCCGGATAT CACTGCAACG TTCCCCGAGT GTTTCTCGCA 204 0 

ACTCATCCTG GCAATAAGAC AGTGTATCCA CATCTCTCTG ATGACTGAGC GATGGTATCC 2100 

GTCACTGGAG CCCTGCCGGC TCATTTATTA CAGCGGTAGC TGGTATCTGA TCGCGTTACA 2160 

GAAGGGAAAA CTGCAGGTCT TTCCTCTGGC AGATATCAAA TCAGTCAGCC TGACATCAGA 222 0 

ACGGTTTGAA CGGAGAGGCC ACATCCACAG TCTGGTCGCT GAAGAGCGTT TTATCTCCGC 2280 

CGTGCCACAT TTCTCTTTCA TCCATAAACT TATCAACACC TTTAACCTGT GATCGCCGGC 234 0 

CTGCCAAAGC CGTCCCGACA GGTATGGAGA CAATATGTTG AACAGAAAAC TAAATATACG 24 00 

GCTACGTCAT TCCCTGAACA GTCACTGCAT ACCTTGGATG ATTATCAATA ACACCGTACG 24 60 

TTCATTTCAG AGGTCAGTCA TGAATACCAG AGCTCTTTTT CCCCTGCTGT TCAGTGTGGC 2520 

ATCATTCTCC GCCTCCGCCG GCAACTGGGC TGTCAAAAAC GGCTGGTGTC AGACGATGAC 2580 

GGAAGATGGT CAGGCGCTGG TAATGCTGAA AAATGGCACG ATTGGTATTA CCGGCCTGAT 264 0 

GCAGGGATGC CCGAATGGTG TACAGACGCT CCTGGGCAGC CGTATCAGTA TTAACGGTAA 27 00 

CCTGATCCCC ACATCACAAA TGTGTAATCA GCAGACGGGA TTCAGGGCTG TTGAGGTGGA 27 60 

AATCGGACAG GCGCCGGAAA TGGTCAAAAA AGCCGTTCAC TCCATAGCAG AGCGTGATGT 2 8 20 

GTCCGTTTTA CAGGCATTTG GTGTACGAAT GGAATTCACG CGCGGTGATA TGCTGAAGGT 28 80 

CTGTCCGAAA TTTGTCACAT CACTTGCCGG TTTTTCCCCG AAACAGACGA CCACTATTAA 294 0 

TAAAGATTCC GTCCTGCAGG CTGCCCGGCA GGCATACGCC CGGGAATATG ACGAGGAAAC 3000 

AACAGAAACC GCTGATTTTG GCTCTTACGA AGTAAAAGGC AATAAGGTTG AGTTTGAAGT 3060 

ATTCAATCCT GAAGACCGTG CGTACGACAA AGTGACCGTC ACGGTTGGTG CTGACGGTAA 3120 

TGCCACCGGC GCCAGCGTTG AATTTATCGG AAAATAGCCG GTATGTCGGA CTGCCACCCT 3180 
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GTTTTATTGC 


CCGAAGGCCC 


TTTCTCACGC 


GAACAGGCGA 


TGGCTGTCAC 


AACAGCTTAC 


32 4 0 


CGCAATGTGC 


TTATTGAAGA 


TGACCAGGGA 


ACGCATTTCC 


GGCTGGTTAT 


CCGCAATGCC 


3300 


GAAGGGCAGC 


TACGCTGGCG 


GTGCTGGAAT 


TTTGAACCTG 


ATGCCGGAAA 


ACAGCTAAAT 


3360 


TCGTATCTCG 


CCAGTGAGGG 


AATTCTCAGG 


CAATAAACGT 


CTTCATTTCA 


TCCATCAGGC 


3420 


CGCGTCTTCT 


CCGGGAGACG 


CGGCCTTTTC 


GTTTATACCG 


CTAATTCATT 


CATAAGGAGC 


3480 


AAAGTATGCA 


ATTAGCCAGT 


CGTTTTGGTC 


ATGTAAATCA 


GATGCGTCGG 


GAGCGCCCAC 


3540 


TGACACGCGA 


AGAACTGATG 


TACCACGTCC 


CGAGTATTTT 


TGGAGAAGAC 


CGGCACACCT 


3600 


CCCGCAGTGA 


ACGGTATGCG 


TACATTCCGA 


CCATCACCGT 


CCTGGAAAAT 


CTGCAGCGGG 


3660 


AAGGCTTTCA 


GCCGTKCTTC 


CCCTGCCAGA 


CCCGTGTGCG 


CGACCAGAGC 


CGCCGGGAAT 


3720 


ATACCAAACA 


TATGCTGCGT 


CTGCGGCGGG 


CCGGACAGAT 


AACCGGTCAG 


CATGTGCCTG 


3780 


AAATTATTCT 


GCTCAACTCC 


CATGACGGTT 


CATCCAGCTA 


CCAGATGTTA 


CCCGGATATT 


3840 


TTCGTGCCAT 


TTGTACCAAT 


GGCCTGGTCT 


GCGCTCAGTC 


GCTGGGAGAA 


GTCCGGGTGC 


3900 


CACACCGGGG 


AAACGTGGTG 


GACAGGGTCA 


TAGAAGGTGC 


TTACGAAGTG 


GTGGGCGTGT 


3960 


TTGACCTGAT 


TGAGGAAAAG 


CGTGATGCCA 


TGCAGTCGCT 


GGTCCTGCCG 


CCACCGGCAC 


4020 


GCCAGGCGCT 


GGCACAGGCG 


GCGCTGACTT 


ACCGTTATGG 


TGATGAACAT 


CAGCCCGTCA 


4080 


CCACTACCGA 


GATTCTGACG 


CCACGACGCG 


GGGAGGATTA 


GGGTAAGGAC 


CTGTGGAGTG 


4140 


CTTATCAGAC 


CATCCAGGAG 


AATATGCTGA 


AAGGCGGGAT 


TTCCGGTCGC 


AGTGCCAGAG 


4200 


GAAAACGTAT 


CCATACCCGG 


GCCATTCACA 


GCATCGATAC 


CGACATTAAG 


CTCAACCGGG 


4260 


CGTTGTGGGT 


GATGGCAGAA 


ACGCTGCTGG 


AGAGCCTGCG 


CTGATACCGT 


TTCCCTGAAA 


4 320 


GCGCAGTCCT 


GTTCACGGCT 


GTCCCTTCCC 


CCAGACATTC 


CACCATTCAT 


TTACTTTTTA 


4380 


TAAGGAATAA 


TCTCATGACA 


ACCTCTTCGC 


ATAATTCCAC 


CACACCTTCT 


GTTTCCGTGG 


4 4 40 


CCGCTGCATC 


AGGGAATAAC 


CAGTCTCAGT 


TGGTTGCCAC 


TCCCGTCCCT 


GATGAACAGC 


4 5 00 


GCATCAGCTT 


CTGGCCGCAG 


CATTTTGGCC 


TCATTCCACA 


GTGGGTCACC 


CTGGAGCCCC 


4560 


GTGTCTTCGG 


CTGGATGGAC 


CGTCTGTGCG 


AAAACTACTG 


CGGGGGTATC 


TGGAATCTGT 


4620 


ACACCCTGAA 


CAACGGTGGC 


GCATTTATAG 


CACCTGAACC 


GGATGAAGAT 


GATGGAGAAA 


4680 


CCTGGATACT 


GTTCAATGCC 


ATGAACGGTA 


ACCGCGCTGA 


AATGAGCCCG 


GAAGCTGCCG 


4740 


GCATTGCCGC 


CTGTCTGATG 


ACGTACAGCC 


ATCATGCCTG 


TCGTACGGAG 


AATTATGCCA 


4 8 00 


TGACGGTCCA 


TTATTACCGG 


TTGCGGGATT 


ACGCCCTGCA 


GCATCCGGAA 


TGCAGCGCCA 


4860 


TTATGCGCAT 


CATTGACTGA 


AAGGGGCGGG 


AATAATGCAA 


CAGATTTCCT 


TTCTGCCCGG 


4920 


AGAAATGACG 


CCCGGCGAGC 


GCAGTCACAT 


TCTGCGGGCC 


GTGAAAACCC 


TGGACCGCCA 


4980 


TCTTCATGAA 


CCCGGTGTGG 


CCTTCACCTC 


CACCCGTGCG 


GCACGGGAAT 


GGCTGATTCT 


5040 


GAACATGGCG 


GGACTGGAGC 


GTGAAGAGTT 


CCGGGTGC TG 


TATCTGAATA 


ACCAGAATCA 


5100 



WO 98/22575 



PCT/US97/21347 



- 1 22- 

GCTGATTGCC GGTGAAACCC TCTTCACCGG CACCATCAAC CGCACGGAAG TCCATCCCCG 51 GO 

GGAAGTGATT AAACGCGCCC TGTACCACAA TGCCGCTGCC GTGGTGCTGG CGCACAATCA 5220 

CCCGTCCGGT GAAGTCACAC CCAGTAAGGC AGACCGGCTT ATCACCGAAC GTCTGGTACA 5280 

GGCACTGGGC CTGGTGGATA TCCGGGTGCC GGACCATCTG ATAGTCGGTG GCAGCCAGGT 5 34 0 

TTTCTCCTTT GCGGAACACG GTCTGCTTTA ACCCGTCACC GTCACAATCA CCTTGATATC 5 4 00 

ACTTCAGTTT CTCTTTCTCA GCTGTTTCTT ACTTTCACAT TCAGGAGGAC TATTGTCATG 5 4 60 

AAAATCATCA CCCGTGGTGA AGCCATGCGT ATTCACCGTC AGCATCCTGG ATCCCGTCTT 5520 

TTTCCGTTCT GTACCGGTAA ATACCGCTGG CACGGTAGCA CGGATACATA TACCGGCCGT 5 580 

GAAGTACAGG ATATTCCCGG TGTGGTGGCT GTGTTTGCTG AACGCCGTAA GGACAGTTTT 5 64 0 

GGCCCGTATG TCCGGCTGAT GAGCGTCACC CTGAACTGAA TCAGGACGGG CATTCAGAAG 5700 

AGCAGAATTA TCGCCACCAC CGGAGCATTC TTAACCAATT TTCTGTGAGG ATTTTATCGT 57 60 

GTCAGACACT CTCGCCGGGA CAACGCATCC CGACGATAAC AACGACCGCC CCTGGTGGGG 5 820 

GCTACCCTGC ACCGTGACGC CCTGTTTTGG GGCACGTCTG GTGCAGGAGG GTAACCGGTT 5880 

GCATTACCTT GCAGACCGCG CCGGTATCAG AGGCCGGTTC AGCGACGCGG ATGCGTACCA 5 94 0 

TCTGGACCAG GCCTTTCCGC TGCTGATGAA ACAACTGGAA CTCATGCTCA CCAGCGGTRA 6000 

ACTGAATCCC CGCCATCAGC ATACCGTCAC GCTGTATGCA AAAAGGCTGA CCTGCGAANC 60 60 

GACACCCTCG GCAGTTGTGG CTACGTTTAT ATGGCTGTTT ATCCGACGCC CGAAACGAAA 6120 

AAGTAACTCT CCAGAATAAC CTTCTGCCCC GGCCTGGTGC TTTCACCACG CCACTTTTCC 6180 

ATTTTTCATC TCTGCATATC AGGAAAATCT TCAGTATGAA CACATTACCC GATACACACA 62 4 0 

TACGGGAGGC ATCGCATTGC CAGTCTCCCG TCACCATCTG GCAGACACTG CTCACCCGAC 6300 

TGCTGGACCA GCATTACGGC CTCACACTGA ATGACACACC GTTCGCTGAT GAACGTGTGA 6360 

TTGAGCAGCA TATTGAGGCA GGCATTTCAC TGTGTGATGC GGTGAACTTT CTCGTTGAAA 64 20 

AATACGCACT GGTGCGTACC GACCAGCCGG GATTCAGCGC CTGTACTCGT TCTCAGTTAA 64 80 

TAAACAGTAT TGATATCCTC CGGGCCCGCC GGGCAACCGG CCTGATGGCC CGCGACAATT 654 0 

ACAGAACGGT AAATAACATT ACCCTGGGTA AGCATCCGGA GAAACGATGA AACTTTCCCT 6 600 

GATGCTGGAA GCCGACAGAA TTAATGTGCA GGCACTGAAC ATGGGGCGAA TTGTCGTTGA 6660 

CGTCGATGGT GTTAATCTCA CTGAACTGAT TAACAAGGTC GCTGAAAACG GTTATTCACT 6720 

CCGCGTGGTG GAGGAATCCG ACCAACAGTC AACCTGCACA CTACCACCGT TTGCAACCCT 67 8 0 

TGCCGGCATA CGCTGCAGTA CCGCACATAT CACGGAAAAG GATAACGCCT GGCTGTACTC 684 0 

GCTGTCACAC CAGACCAGTG ACTTCGGTGA ATCAGAATGG ATTCATTTCA CAGGTAGCGG 6900 

ATATCTGTTA CGTACCGATG CGTGGTCATA TCCGGTTCTG CGGCTTAAAC GCCTGGGGCT 6960 

GTCAAAAACG TTCCGTCGTC TGGTTATCAC ACTTACCCGA CGTTATGGCG TCAGTCTCAT 7020 
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TCATCTGGAT 


GCCAGCGCTG 


AATGCCTGCC 


GGGTTTACCC 


ACTTTCAACT 


GGTAACCAGG 


7080 


AACAACATGA 


AATCATTAAC 


CACGGAAAGC 


GCACTGGATA 


TTCTGATTGC 


GTGGCTGCAG 


714 0 


GACAATATCG 


ACTGCGAATC 


GGGAATTATC 


TTTGACAACA 


ATGAGGATAA 


AACGGATTCA 


7200 


GCAGCACTGT 


TGCCCTGTAT 


CGAA2AGGCC 


AGAGAGGATA 


TCCGTACCCT 


GCGCCAACTG 


7260 


CAGCTTCAGC 


ACCAGAACCG 


GTGAGTGTCA 


CTCATCATCT 


CACTCACCAG 


ACTTCATTCC 


7320 


ACTSACGCCA 


GCCTGAACAC 


GGCTGGGGTT 


TTCATTTATC 


TGGAAAAAGG 


AATATCGATT 


7380 


ATGTCTGAAA 


TCACAGTCTC 


CCGTCCGGAA 


GTGGTCAACG 


AGAATACGGA 


CGTTATCTGC 


7440 


TCCACCTCAG 


TCAGGTACAG 


GTCACTGGAA 


TATGATAATT 


TTCGGGAAAT 


CAGCGAAGCG 


7500 


AACATTCTGA 


GCACATTTGA 


ACAACTGCAC 


CAGAACAAAG 


ATGAAGTCTT 


TGAACGGGGA 


7560 


GTGATCAACG 


TCTTGAAAGG 


GCTGAGCTGG 


GATTACAAAA 


CCAACTCACC 


CTGTAAATTT 


7620 


GGCAGTAAAA 


TTATCGTCAA 


CAATCTGGTG 


AGATGGGACC 


AGTGGGGATT 


TCATCTTATC 


7680 


AGTGGAATGC 


AGGCAGATCG 


CCTGGCTGAC 


CTGGAAAGAA 


TGTTGCATCT 


GCTCAGCGGT 


7740 


AAACCGATCC 


CCGACAACCG 


AGGGAATATC 


ACCATTAATC 


TGGATGACCA 


CATACAGTCC 


7800 


GTTCAGGGTA 


AAGGACGCTA 


TGAAGATGAG 


ATGTTCATCA 


TTAAATACTT 


TAAGAAGGGA 


7860 


TCTGCACACA 


TCACTTTCAA 


AAGGCTGGAG 


CTGATTGACA 


GAATTAACGA 


TATAATAGCC 


7920 


AGGCACTTTC 


CTTCTGTGCT 


CTCAGCCTGA 


CCCCGAGTTT 


GATTCCCTTT 


CGATATCAAA 


7980 


AGGGACTGCG 


GGTACAAAAG 


AGGGTACATC 


TTTCACCAAA 


CCAAACAAAA 


TAAACTAATA 


8040 


TCAACATGAT 


AGAAGCATTC 


TTCGATTCCG 


AGTCCGGCAC 


CAAATTCATA 


TAAACGGACC 


8100 


TCCACGGAGG 


TCCGTTTTTC 


GTTTCAGGAC 


GCCACGATTT 


AAGCGTCCTG 


GCGCCAAATC 


8160 


AATTCTACOG 


AACTCAACCA 


GATTCTCCCC 


ACATCACCAG 


CAATTTGCGG 


GCATATCCCA 


8220 


ATTCGGGAAA 


ATTTGTTTCT 


GAGCTATAGC 


GCTGACTGAG 


GTGAAATGTC 


GTGCGGCCCC 


8280 


GTGATGCTGT 


TGAAMGTCAA 


ATGACGTCAT 


CAGGAGCGTA 


ACGCACCCAT 


AAAGCACAAC 


8340 


ATCGGGCAGA 


ACGCCAACTG 


ATGAGATTTT 


CTGAATGAGA 


ACAAAGAGAA 


ATGTATCAGT 


8400 


CCGTTTGCTC 


ATGCAAAGAC 


TAACAATCCA 


TTAAAATAGT 


AAGCGCTCCG 


GACAATTTTC 


8460 


CATGGATTAT 


TTTCTGAACA 


TTTTTCTTTG 


GCAAAGATGA 


TGAATTTTGA 


TGGTAAGGAA 


8520 


AATTACTTCT 


GGTTCTCAGT 


AAAATCCTTT 


CGTAATACTA 


TGTAATCAAG 


AAGTTTATGG 


8580 


CTAGTAAAAA 


TAACGTCTTG 


CATTGACCAA 


TAATATGTAA 


ATAAACCCAT 


CTATAGATGG 


8640 


AAAAAATAGG 


TTATGGAATT 


ATCATTGCAT 


CATTCCCTTT 


TCGAATGAGT 


TTCTATTATG 


8700 


CAACAACCTG 


TAGTTCGCGT 


TGGCGAATGG 


CTTGTTACTC 


CGTCCATAAA 


CCAAATTAGC 


8760 


CGCAATGGSC GTCAACTTAC 


CCTTGAGCCG 


AGATTAATCG 


ATCTTCTGGT 


TTTCTTTGCT 


8820 


CAACACAGTG GCGAAGTACT TAGCAGGGAT 


GAACTTATCG 


ATAATGTCTG 


GAAGAGAAGT 


8880 


ATTGTCACCA ATCACGTTGT GACGCAGAGT 


ATCTCAGAAC 


TACGTAAGTC 


ATTAAAAGAT 


8940 
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AATGATGAAG ATAGTCCTGT CTATATCGCT ACTGTACCAA AGCGCGGGTA TAAATTAATG 9000 

GTGCCGGTTA TCTGGTACAG CGAAGAAGAG GGAGAGGAAA TAATGCTATC TTCGCCTCCC 9060 

CCTATACCAG AGGCGGTTCC TGCCACAGAT TCTCCCTCCC ACAGTCTTAA CATTCAAAAC 912 0 

ACCACAACGC CACCTGAACA ATCCCCAGTT AAAAGCAAAC GATTCACTAC CTTTTGGGTA 9180 

TGGTTTTTTT TCCTGTTGTC GTTAGGTATC TGTGTAGCAC TGGTAGCGTT TTCAAGTCTT 92 4 0 

GAAACACGTC TTCCTATGAG TAAATGGCGC ATTTTGCTCA ATCCACGCGA TATTGACATT 9300 

AATATGGTTA ATAAGAGTTG TAACAGCTGG AGTTCTCCGT ATCAGCTCTC TTACGCGATA 93 60 

GGCGTGGGTG ATTTGGTGGC GACATCACTT AACACCTTCT CCACCTTTAT GGTGCATGAG 9420 

AAAATCAACT ACAACATTGA TGAACCGAGC AGTTCCGGTA AAACATTATC TATTGCGTTT 94 80 

GTTAATCAGC GGCAATACCG TGCTCAACAA TGCTTTATGT CGGTAAAATT GGTAGACAAT 954 0 

GCAGATGGTT CAAGCATGGT GGATAAACGT TATGTCATCA CTAACGGTAA TCAGCTGGCG 9600 

ATTCAAAATG ATTTGCTCCA GAGTTTATCA AAAGCGTTAA ACCAACCGTG GCCACAACGA 9660 

ATGCAGGAGA TGCTCGAGCA AATTTTGCCG CATCGTGGTG CGTTATTAAC TAATTTTTAT 9720 

CAGGCACATG ATTATTTACT GCATGGTGAT GATAAATCAT TGGATCGTGC CAGTGAATTA 978 0 

TTAGGTGAGA TTGTTCAATC ATCCCCAGAA TTTACCTACG CGAGAGCAGA AAARGCATTR 98 4 0 

GTTGRTATCG TGCGCCATTC TCAACATCCT TTAGACGRAA AACAATTAGC CAGCACTGAA 9900 

CACAGAAATA GATAACATTG TTACAGTGCC GGAATTGAAC AACCTGTCGA TTATATATCA 9960 

AATAAAAGCG GTCAGTGCCC TGGTAAAAGG TAAAACAGAT GAGTCTTATC AGGCGATAAA 10020 

TACCGGCATT GATCTTGAAA TGTCCTGGCT AAATTATGTG TTGCTTGGCA AGGTTTATGA 1008 0 

AATGAAGGGG ATGAACCGGG AAGCAGCTGA TGCATATCTC ACCGCCTTTA ATTTACGCCC 1014 0 

AGGGGCAAAC ACCCTTTACT GGATTGAAAA TGGTATATTG CAGACTTCTG TTCCTTATGT 10200 

TGTACCTTAT CTCGACAAAT TTCKCGCTTC AGAATAAGTA ACTCCCGGGT TGATTCATGC 102 60 

TCGGGAATAT TTGTTGTTGA GTTTTTGTAT GTTCCGGTTG GTATAATATG GTTCGGCAAT 10320 

TTATTTGCCG CATAATTTTT ATTACATAAA TTTAACCAGA GAATGTCACG CAATGCATTG 10380 

TAAACATTGA ATGTTTATCT TTTGATGATA TCAACTTGCG ATCCTGATGT GTTAATAAAA 10440 

AACCTCAAGT TCTCACTTAG AGAAACTTTT GTGTTATTTC ACCTAATCTT TAGGATTAAT 10500 

CCTTTTTTCG TGAGTAATCT TAGGGCCAGT TTGGTCTGGT CAGGAAATAG TTATACATCA 10560 

TGACCCGGAC TCCAAATTCA AAAATGAAAT TAGGAGAAGA GCATGAGTTC TGCCAAGAAG 10 620 

ATCGGGCTAT TTGNCCTGTA CCGGTGTTGT TGCCGGTAAT ATGATGGGGA GCGGTATTGC 10680 

ATTATTACCT GCGAACCTAG CAAGTATCGG TGGTATTGCT ATCTGGGGTT GGATTATCTG 10740 

TATTATTGGT GCAATGTCGC TGGCATATGT ATATGCCCGA CTGGCAACAA AAAACCCGCA 10800 

ACAAGGTGGC CCAATTGCGT ATGCCGGAGA AATTTCCCCT GCATTTGGTT TTCAGACAGG 108 60 
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TGTTCTTTAT 


TACCATGCTA 


ACTGGATTGG 


TAACCTGGCA 


ATTGGTATTA 


CCGCTGTATC 


10920 


TTATCTTTCC 


ACCTTCTTCC 


C AG TAT T AAA 


TGATCCTGTT 


CCGGCGGGTA 


TCGCTGTTAT 


10980 


TGCTATCGTC 


TGGGTATTTA 


CCTTTGTGAA 


TATGCTCGGC 


GGTACCTGGG 


TAAGCCGTTT 


11040 


AACCACGATT 


GGTCTGGTGC 


TGGTTCTTRK 


TCCTGTGGTG 


ATGACTGCTA 


TTGTTGGCTG 


11100 


GCATTGGTTT 


GATGCAGCAA 


CTTATGCAGC 


TAACTGGAAT 


ACTGCGGATA 


CCACTGATGG 


111 GO 


TCATGCGATC 


AT T AAAAG T A 


TTCTGCTCTG 


CCTGTGGGGC 


TTCGTGGGTG 


TTGAATCCGC 


11220 


AGCAGTAAGT 


ACTGGTATGG 


TTAAAAACCC 


GAAACGTACC 


GTTCCGCTGG 


CAACCATGCT 


11280 


GGGTACTGGT 


TTAGCAGGTA 


TTGTTTACAT 


CGCTGCGACT 


CAGGTGCTTT 


CCGGTATGTA 


11340 


TCCGTCTTCT 


GTAATGGCGG 


CTTCCGGTGC 


tccgtttgca 


ATCAGTGCTT 


CAACTATCCT 


11400 


CGGTAACTGG 


GCTGCACCAC 


TGGTTTCTGC 


ATTCACCGCC 


TTTGCGTGTC 


TGACTTCTCT 


11460 


GGGCTCCTGG 


ATGATGTTGG 


TAGGCCAGGC 


AGGTGTACGT 


GCCGCTAACG 


ACGGTAACTT 


11520 


CCCGAAAGTT 


TATGGTGAAG 


TCGACAGCAA 


CGGTATTCCG 


AAAAAAG G T C 


TGCTGCTGGC 


11580 


TGCAGTGAAA 


ATGACTGCCC 


TGATGATCCT 


CATCACTCTG 


ATGAACTCTG 


CCGGTGGTAA 


11640 


AGCCTCTGAC 


CTGTTCGGTG 


AACTGACCGG 


TATCGCAGTA 


CTGCTGACTA 


TGCTGCCGTA 


11700 


CTTCTACTCT 


TGCGTTGACC 


TGATTCGTTT 


TGAAGGCGTT 


AACATCCGCA 


ACTTTGTCAG 


11760 


CCTGATCTGT 


TCTGTACTGG 


GTTGCGTGTT 


CTGCTTCATC 


GCGCTGATGG 


GCGCAAGCTC 


11820 


CTTCGAGCTG 


GCAGGTACCT 


TCATCGTCAG 


CCTGATTATC 


CTGATGTTCT 


ATGGTCGCAA 


11880 


AATGCACGAG 


CGCCAGAGCC 


ACTCAATGGA 


TAACCACACA 


GCGTCTAACG 


CACATTAATT 


11940 


AAAAGTATTT 


TCCGAGGCTC 


CTCCTTTCAT 


TTTGTCCCAT 


GTGTTGGGAG 


GGGCCTTTTT 


12000 


TACCTGGAGA 


TATGACTATG 


AACGTTATTG 


CAATATTGAA 


TCACATGGGG 


GTTTATTTTA 


12060 


AAGAAGAACC 


CATCCGTGAA 


CTTCATCGCG 


CGCTTGAACG 


TCTGAACTTC 


CAGATTGTTT 


12120 


ACCCGAACGA 


CCGTGACGAC 


TT ATT AAAAG 


TGATCGAAAA 


CAATGCGCGT 


CTGTGCGGCG 


12180 


TTATTTTTGA 


CTGGGATAAA 


TATAATCTCG 


AGCTGTGCGA 


AGAAATTAGC 


AAAATGAACG 


12240 


AGAACCTGCC 


GTTGTACGCG 


TTCGCTAATA 


CGTATTCCAC 


TCTCGATGTA 


AGCCTGAATG 


12300 


ACTGCGTTTA 


CAGATTAGCT 


TCTTTGAATA 


TGCGCTGGGT 


GCTGCTGATG 


ATATTGCTAA 


12360 


CAAGATCC 












12368 


(2) INFORMATION FOR SEQ ID NO: 21: 









(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 833 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
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GCACGGCACT CTGATGTANC TTTTATCTGT TCCCAGTGGA AGCATGCCCC ACAACTGAGT bO 

CATTAAGTGT GGAAGAACAG TTTTGTCCCC GCCTGCAATC TCTCCCTTTC NAAAAACCAG 12 0 

TATGTCGCCA TGCCTCGCCT TAATGGAGAG CGCTGAACCA TACCTTCTTT TTCCCAGTAA 1B0 

TAACAGGTAA TAGCGTGCCT GGTAATCCGT TAGCGCCAGC GCCTCCGCAA TTTCTGCGGT 24 0 

TTTCCCTCCA TTATGCCTGT TCAGAAATYC CAGTATTTCA TTCTTCATAT ATTCACTCAT 300 

CTCACTGTAA CAAAGTTYGT YCGAATAATA AAAATCATGG TTTCTGTTAT CAAGGGAAAG 360 

GTATTTTTAT TCTCTGTGTT TGCTTTATTT GTGAAATTTA GTGAATTTGC TTTTTGTTGG 4 20 

CTTTATTTGN ATGTGTGTCA CATTTTGTGT GTTATTTTTC TGTGAAAAGA AAGTCCGTAA 4 80 

AAATGCATTT AGACGATCTT TTATGCTGTA AATTCAATTC ACCATGATGT TTTTATCTGA 54 0 

GTGCATTCTT TTTGTTGGTG TTTTATTCTA GTTTGATTTT GTTTTGTGGG TTAAAAGATC 600 

GTTTAAATCA ATATTTACAA GATAAAAAAC TAAATTTAAC TTATTGGGTG AAGAGTATTT 6 60 

GCGGGCCGGA AGCATATATG CAGGGGCCCG ACAGAAGGGG GAAAGATGGC GCATGATGAA 7 20 

GTCATCAGTC GGTGAGGAAA TGCGTTTTTG CTGAATATAC GCGAGAGCGT AYTGTTGCCC 7 80 

GGCTMTATGT CTGAAATGCA TTTTTTTTTA CTGATAGGTA TTTCTTCTCA TTC 833 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2916 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

TGCACCATCA CTGATACCAC CGGGACCCCG GATTTTATCC GGTCCCCGCG GACTGACAGG 60 

GTTTGTGACA CCTGAGTCAT ATCCGATGTA AACTTCATTT TCACGGGTTG TACAGGAAAA 120 

CTCCCCTGTG CCATTGAGTT CTGATGTGTG CCCTTCGCCA CAACTCCCAC CGTCACGGCA 180 

CCAGTTGCAT CTGACGCCGA CCAACTGCTG AGAGCCATGC CGTTTCCGGC TTTGTCGACA 24 0 

ACGCATGCTG CAGTTCCCAG CGATGCGAAC TGGTCTGGCA TGCATTCACG AACCAACAGC 30 0 

AGTGGTGCTA CGTCCGGATG CAATTCGCAT GAGCTCCAAC CGCGGTTGTA AGTTCAGCAG 360 

CCCGGGCCTC TGCCCCCGGC ACAGTCGCAT AAGTATTCGA TACCGTGCGA CACCATTACC 4 20 

TTCAGGATAC GCCACGGACC CGTCACCCTA CGAAAACGCC GGAGCACCGG CAATCAGCAA 4 80 

AGGCAGCAGT GATAAAAGAC TGATATATTT CCTGTCATTA TTTTTCATAT TAATTTAACT 54 0 

CCTGATTAAC CGGTTTTTAT TGATATGAGA AAGTAATAGT TGCAATAGCC TTCACACTTC 600 

CAGGTGTAGT TGCATCAGCA ATTTTTATAT AATTGGCTCT TAAATTGATA TGTGGATTTA 660 

CCTCTCCCCT GTAATCGGAG AAGTGCCATT GACTGCCATT TCCTTTCACA GGGGAGTCTT 720 
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CACCATAGCT 


GATGGCAGTT 


ACATCACTGT 


CTTTATATAG 


CCTGATGCCA 


AATCCTTTTG 


780 


CAGTGGATTC 


ACTGCTTAAG 


GTCAATATAT 


CTGTTCTGTT 


CACTGGCTGT 


GATGCATCTG 


840 


TCAATGTAGC 


ATAAACATCA 


ATTCCATCCG 


GGCATTGTAG 


GTGTATGTCA 


ATTTTACCTC 


900 


CCTGTATTTC 


TTTATACAAA 


GATGTGAACT 


GTGATTGATA 


TACGGTATTT 


AATGGCACCA 


960 


CATAGTTTTT 


TTGCCCCATG 


GTACATGTCT 


GACTCTGTAC 


CTGAATGCGC 


CCACCATTTA 


1020 


ACATAACAGG 


TGCTGTCAGT 


CCTTTATTAT 


TTAAACTTGT 


ACGTTTTGCT 


TCCAACAAAA 


1080 


TAGTACCAAG 


CTGCCTGGTG 


GGTATTGTTA 


TATATCCATT 


GGGTAATCTT 


CCCGTTGCGA 


1140 


CAAAAGCAAC 


AAACAAACGA 


GCTCCGAAGC 


TTGCTGTCGC 


ACCGTTATAA 


GTATTGGGGT 


1200 


TTGTATTGGC 


ACCTACAGGG 


TCAATATATA 


TACCTGAGCT 


ATTTATGGGG 


ACCAGAGGCG 


1260 


TTGCGGGCCA 


ATAGCCCGCC 


ATGCCAATAA 


TAATACCCAG 


TCCGGATACA 


CCAATATCAT 


1320 


AGATATCAAA 


ATCAGATGAA 


TCACGGCTGT 


TTCCTTGATG 


GAAAGTATAC 


GTAATACTTC 


1380 


CAATTTTAGG 


CAGTGCGGGT 


GTAAACTTTC 


C AC G CATC AG 


AGCGATGGCA 


CCGCCATTAA 


1440 


AAACATACTG 


GTTACTTGTT 


CCCGCCAGCT 


CTCCTATCAC 


CCGGGGATAG 


GTATGGGCAT 


1500 


CAGCAGGACC 


AATCACAACA 


CCTGGCAATG 


TGGATGTATT 


AACCGCTATC 


TGCGAAGGCA 


1560 


CATAATCATC 


CGGACCCGCT 


ACCGCCAGCT 


TAGGGAGTAA 


AATTAAAAAC 


AATGGTATGA 


1620 


AAAAGATTCT 


TTTCATGTTT 


TTTCCTGATT 


AGGGTGCTGT 


ATACACAGAA 


CAGGAACGAG 


1680 


CTGAGATTGC 


ATATCATCTT 


TATTGTGTGC 


AACATGATAT 


ACAAATGAAC 


ATCTGTCTTT 


1740 


ATTATCTGGT 


CCCCATACAA 


CGCTGAGATG 


ACCTTTTTCA 


GGGAGTCCCC 


TGGTAAATAC 


1800 


CTTCCCGGCC 


TGAGCGACAT 


ATCCGGCCAA 


CTGTCCATGT 


TCATCCAGAA 


CTTCAGAAGC 


1860 


CATTGGAGGG 


GGATTGCCAG 


TAGACATACG 


AATATCAAAT 


AACAGACTTC 


TTCCTGTTTT 


1920 


AGTGTCAAAT 


TTYACTAACG 


TGGCGCTATT 


AGCACGAGGA 


ATGATTTCCT 


GCTCCGTCGC 


1980 


CGATAATTCA 


ACATTCAAAT 


CTAAATTGGA 


GGGATCGATG 


CTAATTTGAT 


TTTTCTCATA 


2040 


GGGTGTAACA 


TAAGGAACAA 


TACCATTTCC 


CCAAAAATCC 


AGACGACTAC 


CAGAGGCATT 


2100 


ATTGATGGCA 


GCCCCCTGAG 


CTCCTTCAGC 


ATGGATAATG 


GCAAAAGTAT 


CACTCAGGTC 


2160 


ATTACTCAAT 


GTCACTCCAT 


AGGGGTGTGC 


GACCACCGCT 


CCCGACGCAC 


CAAATGACCT 


° 2 ° 0 


TTGATTATTA 


TTCTGAGTAT 


CATGCCCGAC 


TGTTGTGGTT 


ATATTTACAT 


AAGGTGAACG 


2280 


7\ rn a 7\ i — " f* t — A 

A I AACllll A 


r i cattgc a l 




L ( -L ( ^ 1 1 1 ILL 
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2 34 0 


ATAAGAGAAC 


TGATTATCCT 


CCCCGCCAGT 


ACCACTAATT 


GATGTCTGAA 


TACTATTTTT 


2400 


CTCTTCTTTG 


CTATAATTTA 


AAACAGTGGA 


AAACACCGGG 


CTTTGAACAC 


TTNCCTCCCA 


24 60 


GAGGGAGAGT 


AAAATTAATA 


TAAAATCTGT 


CATCACGGCG 


TTGTTGCTCA 


TTATCTCTTG 


2520 


ACTGAGACAA 


TCCAATTTGA 


TAGCCGAGTT 


GTTTCCAGAA 


GTTGCTGTAC 


CCCATCTGGT 


2580 


ATTCATTACG 


ACTTCCTTTA 


TGTCCCCAGT 


AATTATAGGT 


TGTTCCTGTT 


AAATACATCC 


2640 
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CACCCCATTT TTCACCTAAT TCCTCGTTGA TTGAAATCTG GAATTGATTC CTGGGACGAT 2700 

AAAACGGTGT ACTTTTTACA GAAACATCAT CAATAAACGG GTTGTGATTA GCTGATAGCG 2760 

CATCCTTCAG ATGATAAAAA TCTTTTGATG AATAACGATA AGCCGCCAGA GTTATATTTG 2820 

TGTTTTGAGG GCTGGGAATA TTGGATGGCT AATAACTTGG AGTNGCAGGA CTAATAAACC 2880 

TTTTACGGCG GTTACACCGG GAATACGNGG AAATGC 2 916 

(2) INFORMATION FOR SEQ ID NO: 23: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2677 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
ACCGCATCGC CAATCTCAGC GGCAGTGGTT TACATGTCTT CCGTGATGGA AGGTCATGGC 60 

ATCAGCTACC TCCATCTGCT CTCCGTGGTC ATCCCGTCCA CCCTGCTGGC GGTTCTGGTG 120 

ATGTCCTTCC TGGTCACTAT GCTGTTCAAC TCCAAACTCT CTGACGATCC GATTTATCGC 180 

AAGCGTCTGG AAGAGGGCCT GGTTGAACTG CGCGGTGAAA AGCAGATTGA AATCAAATCC 24 0 

GGTGCAAAAA CGTCCGTCTG GCTGTTCCTG CTGGGCGTAG TTGGCGTGGT TATCTATGCA 300 

ATCATCAACA GCCCAAGCAT GGGTCTGGTT GAAAAACCAC TGATGAACAC CACCAACGCA 360 

ATCCTGRTCA TCATGCTCAG CGTTGCAACT CTGACCACCG TTATCTGTRA ARTCGATACC 4 20 

GACAACATTC TCAAYTCCAG CACCTTCAAA GCAGGTATGA GCGCCTGTAT TTGTATCCTG 4 80 

GGTGTTGCGT GGCTGGGCGA TACTTTCGTT TCCAACAACA TCGACTGGAT CAAAGATACC 540 

GCTGGTGAAG TGATTCAGGG TCATCCGTGG CTGCTGGCCG TCATCTTCTT CTTTGCTTCT 600 

GCTCTGCTGT ACTCTCAGGC TGCAACCGCA AAAGCAYTGA TGCCGATGGC TCTGGCACTG 6 60 

AACGTTTCTC CGCTGACCGC TGTTGCTTCT TTTGCTGCGG TGTCTGGTCT GTTCATTCTG 7 20 

CCGACCTACC CGACACTGGT TGCTGCGGTA CAGATGGATG ACACGGGTAC TACCCGTATC 7 80 

GGTAAATTCG TCTTCAACCA TCCGTTCTTC ATCCCGGGTA CTCTGGGTGT TGCCCTGGCC 840 

GTTTGCTTCG GCTTCGTGCT GGGTAGCTTC ATGCTGTAAT GACCCATYGC GGGGCGTTCA 900 
CGCCCCGCTT TCTTTCCCGC CGACTAACAT CCTTTCCCCG TCCGTTGTAT AGTGACCTCT 960 

CTCTTGCGGT TCCATCTGTT CTTGCGAGGT GTTTATGCTT GATGAAAAAA GTTCGAATAC 1020 

CACGTCTGTC GTGGTGCTAT GTACGGCAC Z GGATGAAGCG ACAGCCCAGG ATTTAGCCGC 1080 

CAAAGTGCTG GCGGAAAAAC TGGCGGCCTG CGCGACCTTG ATCCCCGGCG CTACCTCTCT 1140 

CTATTACTGG GAAGGTAAGC TGGAGCAAGA ATACGAATGC AGATGATTTT AAAAACTACC 12 00 

GTATCTCACC AGCAGGCACT GMTGAATGCC TGAAGTCTCA TCATCCATAT CAAACCCCGG 12 60 
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AACTTCTGGT 


TTTACCTGTT 


ACACACGGAG 


ACACAGATTA 


CCTCTCATGG 


CTCAACGCAT 


1320 


CTTTACGCTG 


ATCCTGCTAC 


TTTGCAG CAC 


TTCCGTTTTT 


GCCGGATTAT 


TCGACGCGCC 


1380 


GGGACGTTCA 


CAATTTGTCC 


CCGCGGATCA 


AGCCTTTGCT 


TTTGATTTTC 


agcaaaacca 


1440 


ACATGACCTG 


AATCTGACCT 


GGCAGATCAA 


AGACGGTTAC 


TACCTCTACC 


G T AAACAG AT 


1500 


CCGCATTACG 


CCGGAACACG 


CGAAAATTGC 


CGACGTGCAG 


CTGCCGCAAG 


GCGTCTGGCA 


1560 


TGAAGATGAG 


TTTTAGGGCA 


AAAGCGAGAT 


TTACCGCGAT 


CGGCTGACGC 


TTCCCGTAAC 


1620 


CATCAACCAG 


GCGAGTGCGG 


GAGCAACGTT 


AACTGTCACC 


TACCAGGGCT 


GTGGTGATGC 


1680 


CGGTTTCTGT 


TATCCGCCAG 


AAACCAAAAC 


GGTTCCGTTA 


AGCGAAGTGG 


TCGCCAACAA 


1740 


CGAAGCGTCA 


CAGCCTGTGT 


CTGTTCCGCA 


GCAAGAGCAG 


CCCACCGCGC 


AATTGCCCTT 


1800 


TTCCGCGCTC 


TGGGCGTTGT 


TGATCGGTAT 


TGGTATCGCC 


TTTACGCCAT 


GCGTGCTGCC 


1860 


AATGTACGCA 


CTGATTTCTG 


GCATCGTGCT 


GGGCGGTAAA 


CAGCGGCTTT 


CCAGTGCCAG 


1920 


AGCATTGTTG 


CTGACCTTTA 


TTTATGTGCA 


GGGGATGGCG 


CTGACTTACA 


CGGCGCTGGG 


1980 


TCTGGTGGTT 


GCCGCCGCAG 


GKTTACAGTT 


CCAGGCGGGG 


CTACAGMACC 


CATACGTGCT 


2040 


CATTGGCCTC 


GCCATCGTCT 


TTAC YTTGCT 


GGCGATGTCA 


ATGTTTGGCT 


TKTTTACTCT 


2100 


GCAACTCCCC 


TCTTCGCTGC 


AAACACGTCT 


CACGCTGATG 


AGGAATCGCC 


AACAGGGCGG 


2160 


CTCACCTGGC 


GGTGTGTTTA 


TTATGGGGGC 


GATTGCCGGA 


CTGATCTGTT 


CACCYTGCAC 


2220 


CACCGCACCG 


CTTAGCGCGA 


TTGTGCTGTA 


TATCGCCCAA 


AGCGGGAACA 


TGTGGCTGGG 


2280 


CAGCGGCACG 


CTTTATCTTT 


ATGCGCTGGG 


CATGGGCCTG 


ccgctgatgc 


TAATTACCGT 


2340 




PGCTTGCTGC 


CGAAAAGGGG 


CCCGTGGATG 


GAACAAGTCA 


AAACCGCGTT 


2400 


TGGTTTTGTG 


ATCCTCGCAC 


TGCCGGTCTT 


CCTGCTGGAG 


CGAGTGATTG 


GTGATATATG 


24 60 


GGGATTACGC 


TTGTGGTCGG 


CGCTTGGTGT 


CGCATTCTTT 


GGCTGGGCCT 


TTATCACCAG 


2520 


CNTACAGGCC 


AAACGCGGCT 


GGATGCGCGT 


GGTGCAAATA 


ATCCTGCTGG 


CAGCGGCATT 


2580 


GGTTAGCGTG 


CGCGGACTTC 


AGGATTGGGC 


ATTTGGTGCA 


ACACATACCG 


CGCAAACTCA 


2640 


GACGCATCTC 


AACTTTAGAC 


AAATCAAAAC 


AGTAGAT 






2677 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 537 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
ATCCTGATGA CGCCGTAAAT GTGCATTTGC CAGGATTGCC GCATAGAGGG CACGAAGAAA 60 
AGGTCGGTTG TCAGGATGTA TCCAGATGAT TCTGCCACTG AAACCTTCAG GGATAAGACG 120 
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ATTGCCAACT GCCAGTCCTT TAAGG GCAGC ATTCAGCGCC TTACGCGGGG CATTCTGCTC 180 

CAGAAATACG TATGCCAAGT GAGCGTGTAC ATCAATAAAG TCATTCTCCT GTCGGGCAAG 24 0 

GCGCCTGAGT TTGTTGATGT AACTTGTTTC GGTGATTTCA TCCGCATCGT ATGCATCAAT 300 

CAGTTCTTCA AACTCATCCA GCAACGAGCC AAACCAGGTT TCCGGAAATA TGAAACAGCC 3 60 

CTGGTTATCG TTCACTTCAA AGCGTAATTT GCCAGTCATA TTCTGAACCT GTAAAAAAGG 4 20 

ATAGACCATA ATCTGCAGGC TATAAAAATT GTGGATGCCT GGCATCGGGT GTCCTTTTAT 4 80 

TGTCCGGGAT TAACGTTGCC CATGATAATA CAGTGAATCC NGTTCTGTGG TAAGACG 5 37 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

CGCTCGAGCA CCAGATTCAC TGACATGCGC AAACTCATGT GTAAATCCTG TCTGGGCATC 60 

TATCTCAAGT AACAGTTCCG TTAAATCTAC CGGTGGGAGT AGCTGTTTGA TCCGATTATT 120 

TAGACGAAGC AATGATGGTG GCTCTTCCTG TTTCTCCAGA CAACTGATAG TCAGGGATGG 180 

ATATTTACCT TCATTACAGA TATGAACTTC CGCATTCTTT TCAAATCGTG ATGCCAGGCT 240 

TTCCAGGTCT CATCCAGCTG AATAGCCAGT TGTTGCACAC CTTTACGTCC ATCGACAGGA 300 

TGTCCCAGTG CCCGACAGAC AGGAATACGC TGAGTCTGCC ACTCTTCACC TTGCAACAAC 3 60 

TTCTCGCGAG GATCTCCCCA GCGATCACTG TTTTCAAGCC CAGATGTCCC CGGCGGCGCA 4 20 

RTGCATCCTG AAGGCGTTCC AGCAAACATA GTGAATAACC TGCACGCTGT ATCCCGTCCC 4 80 

TCCGCATCGT ATACGAGGCG TTTCCAGGGA CCGGTGATAA TATGTTCAGC GCATCATCAA 54 0 

GGATGCGCTT TTTCGAACCA TTCAGTTCTG CCAGATAATG AATCGCAGCC AGTACATGTC 600 

ACCTGCCGGT GCCGCACGGA AATGCAGGTC CCGCAACACC GCCGGAAGAA AACGTTTAAC 6 60 

CCGACCGTAC TGCTCAACCA TTTCGTCATG GAAATTATTG TTCTGTGGAC GAGCAAGTTC 72 0 

ATTAACCTTG CTTACAGATT CTGCCAGTCT GTTTTTGGGT ACGCACTTGA AGATAACCTG 78 0 

CCTGAGATCT GGGACATCTG TATTATCATC CAGCAACAAT GCACATGCCC GCGCCAGTAA 84 0 

CAATGCGGCC TGATCAAGAT CTTTCAGTGT CCTGAGTCTT TTTTTTTGCC CGGTTTTCTT 900 

TGCTTCGCGG ATAATGTCCA GAATTAGCAT ATCAAGCACA TCAACGGCAT CGTCTAATGC 960 

CGTTATTTCC TGTGCTTTAA CGAATGCAGT AAGTACAGCA AGCTTTCTCT GCTGTGGCAT 102 0 

TCGAGCGATA TATTTTACCG ACGCCATGCC AGCATGAACG AGCCAGATTA CGCNTTGGNA 108 0 

ATGGTCAGGC AGACCGGGAA AAGTTCCAGT CGGGNAAAAC TCCAAGAA 112 8 



5NSDOCID <WO 9822575A2 I > 



WO 98/22575 



PCT/US97/21347 



-131- 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2311 base pairs 

(B) TYPE: nucleic acici 

(C) STRANDEDNES5 : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 26: 



GGNTGATAAA 


AATCYTTTGA 


TGAATAACGA 


TAAGCCGCCC 


AGAGTTATAT 


TTGTGTTTGA 


60 


GGCTGGAATA 


TTGATGCTAT 


AACTTGAGTG 


CAGACTATAA 


CCTTTACGCG 


TTACACCGGA 


120 


ATACCTGAAT 


GCTGTTCTGG 


ACAATGTAAT 


GTCAGATGCT 


ATAGCACCCA 


GATGGGTATT 


180 


AAAGGCCAGG 


CCAGCTAACC 


CCGCTGTATA 


TCCTGAAGCT 


GTGGTAAGAC 


CACTGTTTAA 


240 


AGTAATATCA 


TTCGTCAGGC 


CGTATTGATA 


GGTGCCTTGT 


GCTATTAAAT 


CATTATATGT 


300 


TTTATTCGCA 


TAACGATACT 


TTCCCACTGA 


CATTTGCCAG 


CG ACT AAATC 


CGGGACGAAT 


360 


GAGTTGAGCA 


ACGGCCGCAA 


AAGGAACCGT 


GAACATTCGT 


GTCTGGCCAT 


TAGACTCTGT 


420 


TATCTTAACG 


AGAAGGTCAC 


CAGCATATCC 


ACTGGGATAT 


AAATCATTGA 


TGACAAATGG 


480 


TCCGGCTGGC 


ACCGTCGTTT 


CATAGAGGAT 


ATGAGCATTT 


TGATAAATGG 


TTACTTTAGC 


540 


ATTACTGTTA 


GCTATTCCCC 


GGACAGCAGG 


RGCATAGCCA 


CGTAAAGAAC 


CGGGTAACAT 


600 


TCGTTCATCC 


GATGCTAACC 


TGACTCCCCG 


CAAACTGAGG 


CTATCCATTA 


GCTCACCATT 


660 


CGTATAAAAA 


TCCCCTAATG 


TGAATTGTGC 


TCTCAATGGG 


GCAAGGTCAT 


GCATTATACT 


720 


TGTTTCTATA 


TTCTGATATC 


CGGCAGGATA 


GCTATTATTC 


CAGCTCTCAC 


TGCCACGGTG 


780 


GCGCAAAGCC 


ATCCCCACAA 


ATTGAATCCA 


GCTTTTAATC 


CCAGATAAGT 


CTGTTCGTTA 


840 


CTCGTCCCGG 


AAGAGCTATA 


CTGGTAATAG 


TTAGCATCAT 


AGTTTATAAA 


TGCTGCAGGA 


900 


ACACCACTTT 


GCCACTGAGA 


AGGGGAAATA 


TATCCTCTTG 


GACGTGTATT 


CAGCAGTGCT 


960 


GCGGGATTTC 


GATATTCAAC 


CTTAAAGTCG 


ATAAGTCAAA 


ATTAATTCTG 


GCTGAAGAAA 


1020 


GCCCTGTTGA 


CGCCGGAAAG 


CAGGAGGTGT 


TTCCCGACAT 


AGTATCTTTG 


ACTAAATCAA 


1080 


TCAATGAAAG 


CAGCTCAGGC 


GTCAGGCATA 


ACGTCGGAGC 


ACCGGTATTG 


GCAGTACGTA 


1140 


AATACTGCAA 


ATCAGCCTTC 


CCCTTCCATA 


CATTATTAAC 


ATAAATATCA 


GAATAATACC 


1200 


TGCCCTCAGG 


CACAGGGTTA 


CCATGACTAA 


AGCGGCGGAT 


ATCAATAGCA 


TTTATCCCTT 


1260 


TATCCAAATG 


CAAAAACTCA 


GAATCAAACT 


CAGCCTCTTC 


AGCAGCAAAT 


GAATGGTTTG 


1320 


TTACTGTTAA 


CCCTAATGCA 


GCAAAAAGCA 


GAAGAGAACA 


ACGACAGTAA 


ATCAGGCATG 


1380 


ACAGATTATT 


AGCGTTCATT 


ATTACCTTAC 


TCCAGAACAG 


ATTCTCCTTG 


CTGATATCCT 


1440 


CCGTAATCAT 


TAACAATAAC 


CCAGGAAACT 


TTGCTGGTGG 


CGCAGTTCTG 


CCTTTAAGTG 


1500 


CAAATACTGT 


TGAAGAGAAA 


GGGGGAATCA 


TTCCACCATG 


TTCAACAGGC 


GTTAAGTGCT 


1560 
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TATTCTGG7C AACTGCAATT TTG77GTAGG TTATGTAATA AGGTGTTGGA TTAACTGCTT 1620 

TAATT CGGCG TTCCTCCTGG TGCCACGTAA CTTTCAGATA AGCATCATTT GGTGTTAACT 1680 

TCAGGTGAGG AGGACGAAAG AAAAATTTTA TGCGACTACG AACAGCTAGT T G CAAAT AAT 174 0 

TATTATTCCG CTGCTCTGAG TTATCGGAGT CTTTTTTTGC CCTGGGCTTT GCTGGAATAT 1800 

CCAGAACATT TAG AT AG AAA AGAGATTGTC GGTCTTTCGG TAGTGACTCG CCTGTATATA 18 60 

CAATTCTGAG TGTTTGTCCT GATTTAGAGT CCATACGAAA TATTGGCGGA GTAATGATAA 192 0 

AAGGACGTGG ACTGAGTCAG GGGGAGCTGC TGCATCTCCA TCGYGAACCA GGACTGGACT 1980 

AATGCCGAGA TTTCATTGTG ATTATTTNAA CGTATGCTAA TAGTGTTTTG AGTGGCCGGA 204 0 

TAAACAACAC GGGTTCCCAT GATAACTAGA CTACCCTGAA CAACTGCAGA TACAGATAGA 210 0 

GTAAAAAAAA ACAGCACAAA CCTTAGGATG GTATCTCCAG AAGAAAGCAG GGCAGTATTT 2160 

CCTGCCCCAA AATACAAAAC CGTTTGTTAT TCGTAGGCGA TGGTATAATT GACTGTTGTT 2220 

TTTACATTGG CTGGAGTTGA TGTGCGGGTC GCATAATATT GAGCGATATA ACGTAATGTG 228 0 

GCATTACCAT CCCGACCAAT AGTTTCAGAA T 2311 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

TATTACCTGT GATTTTTCCG GGCGTAAATG GAGTCCCTAA AGTTATCGCA GTCCCAATAT 60 

TTCCTGCATT ACTGTTATAA AGATAAACGA GTAACCCATC AGAAGATGTG TTTGATGTAT 120 

TCTGAACTAA AATAGCATTG TNATAAGTGT TTGTTGCCGT TATCGTAACC TTCATTGTTC 180 

CCAGATTATA GGGACACCGC ATATTCACAG TAAACTCTTT TTCGTGANTT CCATTTTGAC 24 0 

TCAGGGTCTG AATCTCTACA NCCTGCCAGT CAACAGTTGT GTTGCTTACA GTACAGGCAG 300 

GAATAATCAG TTTTCCTCTG AAGGTCAGAT TATCAACTGC ATGTACATGC TGAGACATTA 3 60 

ACACTGCCCC CAGCATTACC GGAAGACACA AACCTCTTAT CTTTTTCATC TGAAATATCC 4 20 

T G T AC AAAAA TTTTGCTAAC GATATGTCAA TTCAAACGTG GCTGTTGCTT CATAATCACC 4 80 

GGGTACCACA CTCTTCGTCC GCAGGGCTTC CGGCGTTGCC ACAACATACG CGCCGAAAGG 54 0 

AAGCTCAAGA CTGTTTCCGG TAACCTTTTC CCCCTGGCCT TTGTTATGGG AGGTGCCGGG 600 

TTTCAGCAGA CTGCTGCCAT CGGTGTCCAG CAGTGCAATG CCTAACCGGC CAGCATTCAC 660 

TCCGGTTACC TTCAGATGGC CCGGGAGRCG CYNTCTTCCG TCCCCTTAAA GGTCAGGGTC 720 

ACAATTTTGC CAACTGCTGT TGCATGGCAG TTTTCCAGCC TGATGACAAA CGACTCTGTC 780 
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GGCGAACGTC CGGGCGGATA CCAGAAATCC CTGGACGCCC GGGTTTTGAA GACGACATGT 



840 



TTATTCAGAC TGTCACCGGA CACATGGCAG 



GGTCTGTCAA GCACATTACC CCTGAATGGC 



900 



ACATCTGAGG CTATTGCCTG TCCGGCAGAC AGTGCGGCAA ACAGTAAAAG AGCGCCTGTG 



960 



CTTTTTATCA TCACATTCCC TTA2T 7 AT AT TTTATGCTCA GACGCAGCAT GGCCGGATTG 



1020 



CTCCTGGCAT CAGAATACTC AACCTCCTGT GGCGGCCTTT TCCTCCAGGC GGGCAAGCAT 



1080 



CTCCTCCTGG CGGCGGGTAA GGCGGGGACA GTAAAAAA 



1118 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 562 bare pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDMESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

TTCGTGGGTG AAATCGTAGG CCGCGCTTTT TTGCTGATCG GCCAGTTGAT GAATAGGGTG 60 

GCCAKGATCG GGATAAAACG TACAGGCAGC GATAAACAGA CAGCCCGGAT AGCGGTTGTT 120 

TTTAACGCAC TCCGATAACG CCTGATAACG TGCCAGCAAC TTTTGTTCGG CGGTTTGCGT 180 

TTCGTCCAGC ATCAGCTGAC GACGCCAGAC ATCTATCTGT TGGCTAAGAT AACGCAGCGC 24 0 

ATCGTAGAGG ATTGCCTCTT TGTCTGGCCA GAAGCGGCGT ACTCGTCCAG TGGATAATCC 300 

ACACGTTCAG CAACCATCTC CAGCGTGGTG TTGGCAATCC CTTGTAATTC TAATAATTTC 360 

AGGGCTTCTC CCAGTACATC TTCACGTTGC ACGCTATTTT CCTCCGKCTT TCCCACTGCA 420 

ATGTTCGKTC ACGGTTGGCG ATCGCGCAAA TGTGCGCTGG AAGGTTTCAG CATCCATAAA 480 

GCCCGTGACG CGTGCTTGTG GATGCTCCTG GCCTTGGTCC GGTCAAAAAA GAGAATTTGT 54 0 

CCGGTAGGGC CAAGGATATT AA 562 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 4 b base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

CCATCGCTTT ACCCCAGAAA AGTTAAGCCA TATAATGTGA GGGATATAAG TCGTCGTATC 60 

CGGTAAGTAC AGATAACCAC AACATAAGCT CATTCAGTAA ATTTTATCTC TGAACAAACG 120 

ACTATGGCAT GCTCATTTAT ACTATTCATA AGAAAGTGTG ATTATCTGTA AGCATTAACC 180 

ATCAAATCAT ATAACCATAC TAAACTGGCG GATCATCAGC ACCATTAGCA GGTAACTTAT 24 0 
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TGAAATTTTA TTATGTGTTT TTTGTTGATA ATTAATATGC AATATGAATT TGCTATTTTA 300 

GAATCATGAA CACCATTTAA AATTACGATC ATTAACATCA TATAAAAATA TATTTTTACT 3 60 

AAAACATGAA TTGTATATAT TTATTAGGTC AGGAAAATTA TCAGGGTTGA CCTTCAAATT 4 20 

AACCTGAATG TTATGCTTAA TTTGACCCAG TAGTTCTTCA TGTGTAGATT TTATTATCCC 4 80 

ATTATTATAA TCGATAAATG CACACATGTT TTTTATGAAT TCAAAACCTT TTCCTGTATA 54 0 

CAGTTTAATG AATGCCACCA GAGCAAACAT TTCAAGATGT AGCCATAATG CTACGTTAGT 600 

TTTTTGCAAA GTATAAAAAA TTGAATTCGC CACTTTTTTA CTTATTGCTC TTTTATACTG 660 

TGATCGAGCA AGATTCAGTA GCGGAAGTCC TCGTTCAATA AATGAATGTG AAAAGACTGG 7 20 

ATAAATTGAT GTCGGAAACC TTTCA 745 



(2) INFORMATION FOR SEQ ID NO: 30: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 
(3) TYPE: nucleic acid 

( C ) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GCGTTNATGC ATTTCGASAT TTTCCACTTC GTTCTGACGT TGCACTGCTT TGGCGTCATC 60 

ATTACGTAAC GTATCGAGGA AATCGAGGTA GCCCTGATCA ACATCTTTGG TGACGTAGAC 120 

GCCGTTGAAC ACCGAGCATT CAAACTGCTG GATATCCGGA TTTTCAGCGC GAACGGCGTC 180 

GATCAGATCG TTCAGATCCT GGAAAATCAA CCCGTCAGCA CCGATGATCT GGCGAATTTC 24 0 

ATCAACTTCG CGACCGTGAG CGATCAGTTC CGTGGCGCTC GGCATATCAA TACCATAAAA 30 0 

CGTTCGGGAA AGCGAATTTC CGGTGCCGCA GAAGCGAGGT ACACTTTCTT CGCTCCGGCT 3 60 

TCGCGTGCCA TCTCGATAAT CTGTCAGAAG TGGTGCCACG 4 00 
(2) INFORMATION FOR SEQ ID NO: 31: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 824 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

TGTCGACGAT GAGGCAGCCA GAGCATTAGA GCCGAAAAGA AGGGATGATG CCATGACTGC 60 

TGTTGCTATA AAATGTTTCA TATATTCTCC ATCAGTTCTT CTGGGGATCT GTGGGCAGCA 120 

TATAGCGCTC ATACTAGGGG TTTGAGGGCC AATGGAACGA AAACGTACGT TAAGGAGATA 180 

ATTCGTTGTT TATATTTAAA TTTAGAGCTC TCAGTTCCCC TTTTAAAATA TCCTCTGGCA 24 0 
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ACGTGAATGT ATAATGGCCC AACATATTGA TATGCCCGTG CATCAGGGGA GATAGCCGAG 300 

CGATATCTTC ATCTATAATT TCTTCGCCAT TACGGCGCAT CCAGCTCAAC GCTTCCTCCA 360 

TATAGAGCGT GTTCCACAGA ACCACTGCAT TAGTAACCAG GCCCAGCGCC CCCAGTTGAT 420 

CTTCCTGCCC TTCACGATAA CGCTTTC TGA TCTCTCCGCG TTGTCCGTAA CAAATCGCAC 4 80 

GAGCCACAGO GTGCGKTCCT TCTCCTCGAT TAAGCTGCGT CAGGATCCGC CGACGATAAT 54 0 

CTTCATCATC AATATAATTG AGGAGATATA GCGTTTTGTT TACACGCCCT ACTTCCATAA 600 

TTGCCTGTGC CAGTCCTGAT GGGCGCGAGC TTTTCAGTAA AGAGCGAATG AGTTCTGACG 6 60 

CATGAATTGT ACCCAACTTC AGGAACCAGC GGTTCGCATC ATCTCATCCC ACTGACTCTC 720 

CGCTTTTGAC AGATCTGCAT ATCCTCGGGC CAACTTATCC AGTACTCCGT AGTTTGCCGA 780 

TTTATTCACC CGCCAGAACA CCGCCTCACC TGCATCGGCA AGCC 82 4 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 911 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



ACAAATCAGA 


CCAGTTAACC 


AGTCAGTCGG 


TTTTATGATT 


TCACTCACTA 


TACTTTGTTT 


60 


CATAAGGATT 


TCAGGATCTG 


CCAGACTGCG 


CAGAAATGAT 


GCTTACGAAT 


ACACAGTAAA 


120 


GGCAATGTCA 


TTTCCGATAC 


AGAGCCTGAC 


ATTGCCATAA 


TGAGCTATTT 


ATCTGAAAAA 


180 


CGACAGAATA 


TGATGTTTTA 


TCGTAACGTA 


ATTTTAAGTT 


CTCAACTTAT 


TGAGACATAT 


240 


TGTCTTTTTT 


ACCCATGTGG 


TCATTTTTCA 


TCCCATCCGT 


TTTGCTCATG 


TGTTCTTTCT 


300 


CCATTTTCTC 


TTTATCCATT 


GCATTTTTGC 


ACATACCATC 


CTTGCACATT 


TTATCATGCG 


360 


CGCTGGACAT 


GCTGCCTTTT 


ACTTCATGTG 


TTTTATCCAT 


TGTGTCTGCT 


GCCTGAGCAT 


420 


TGAACATGAA 


CAGCGCGGAT 


AGTACAGTTG 


CAGAAATAAT 


ATTTTTCATG 


GTTCTTCCTC 


480 


ATTTTTAACA 


ATTGTATCAA 


CAACCACCAA 


ACCAGTTATA 


ACCCTGGTCT 


TCCCAGTACC 


540 


CCCCCGGAAA 


ATGATTAGTG 


ACCTCTATAA 


CCTGAACATG 


CTTGGGGTTT 


TTATATCCCA 


600 


GCTTAGTAGG 


GATACGTATC 


TTTATGGGAT 


AGCCATATTC 


TTTTGGCAAT 


ACCCTGTTAT 


660 


TCCATGTCAA 


TGTCAGCAAT 


GTTTGTGAAT 


GTAGTGCTGT 


CGCCATATCA 


ATACTGGTGT 


720 


AGTAACCATC 


GACGCAACGA 


AAACTGACGT 


ATTTTGCCCG 


CATATCGGCA 


CCAATCAGCG 


780 


TCAGGAAATG 


CCGGAATGGT 


ATCCCTCCCC 


ATTTTCCTAT 


TGCACTCCAT 


CCTTCAACAC 


840 


NGATATGACG 


GGTTATCTGA 


CTCACATGCT 


GCATGTTATA 


CAATTCAGAC 


CAAAAACCAG 


900 


TTACGGGTTA 


T 










911 
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(2) INFORMATION FOR SEQ ID NO: 33: 

(l) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4 63 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 33: 

NGGGGCAGGA TAATTGTATC CTGCCCNGTA TATAATTCTC AGCACAGGTG TTGACTAAAG 60 

AGCGTGAAAC TTTGCTATTA TGTCTTCGTA AGATTCACGG ACGGTTATAC TTGAGCCTGA 120 

TTCTGTGAAG TAAACAACAG CAGAAGCATC GTTGCCTTTT TCAATGTATG AAACATTCCA 180 

GTCATGGATA GCCACTGCGG GCTGACCATT ATCCCGACGG TGCGTCTTAA TGAATCGCGG 240 

AAGTAATTCT GCAATATCGT TAAAAACACC ATTTACGGTA TGAGTGATAC CACCAACGCA 300 

ATGTAGATGA GTTGACTCCG GGGTATCATT GTCTGCTTCT GCAAAGAGTA TAGCTGTCTT 3 60 

GCTAATTGTA ACAGGCGCCT GTGARCGGGA TAATTCGAGA GAAATAAACC CGGATTCTGC 4 20 

CATAAAAACT CCAGTTTGTG ATGTTATATC ATTTCATATG TTT 4 63 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 565 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

TTCTAACCTC TGACCAAAAA CAGAATTACG GTTGTTATGC TGCAGAACCT AATGACGTGC 60 

AACTGGCGCG CTATTTTCAT CTTGATGAAC GGGATCTGGC CTTCATTAAC CAACGACGGG 120 

GCAAACATAA TAGGCTGGGC ATTGCGCTTC AGCTCACCAC AGCCCGTTTT CTGGGAACAT 180 

TTCTGACGGA TTTAACTCAG GTTCTGCCTG GTGTTCAACA TTTTGTCGCG GTACAGCTTA 24 0 

ATATCCACCG TCCAGAAGTT CTCTCCCGCT ATGCTGAACG GGACACTACC CTTAGAGAAC 300 

ATACTGCATT AATTAAGGAA TATTACGGCT ATCATGAATT TGGTGATTTT CCATGGTCTT 3 60 

TCCGCCTGAA GCGTCTGCTA TATACCCGGG CGTGGCTCAG TAATGACGAC CGGGTCTGAT 4 20 

GTTTGATTTT GCCACTGCAT GGTTGCTTCA AAATAAGGTA TTACTGCCCG GAGCAACCAC 480 

ACTAGTACGT CTCATCAGTG AAATTCGTGA AAGGGCAAAT CAGCGGCTGT GGAAAAAGCT 54 0 

GGCCGCACTG CCGAACAAAT GGCAG 565 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 
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( A ) LENGTH: 512 base pairs 
(D) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

CGATGGCGTC CGGGGTGAAC GCCGGATAAG TTTAATTTAT CCGGTCAGGC AAAAGGCATT GO 

AATCTGCAGA TAGCTGATGT CAGGGGAAAT ATTGCCCGGG CAGGAAAAGT AATGCCTGCA 120 

ATACCATTGA CGGGTAATGA AGAAGCGCTG GATTACACCC TCAGAATTGT GAGAAACGGA 16 0 

AAAAAACTTG AAGCCGGAAA TTATTTTGCT GTGCTGGGAT TCCGGGTCGA TTATGAGTGA 24 0 

GTCACTCCGG TGAGATGTCC GGTTATTTAT CTTTTTTGTG AATCTGGTGA TGCGTGGAAT 300 

GAAAGACAGA ATACCTTTTG CAGTCAACAA TATTACCTGT GTGATATTGT TGTCTCTGTT 360 

TTGTAACGCA GCCAGTGCCG TTGAGTTTAA TACAGATGTA CTTGACGCAG CGGACAAGAA 4 20 

AAATATTGAC TTCACCCGTT TTTCAGAAGC CGGCTATGTT CTGCCGGGGG CAATATCTTC 4 80 

TGGGATGTGG AATTGTTAAC GGGGCCAAAG TA 512 



(2) INFORMATION FOR SEQ I D NO : 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 827 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 



TTGCCGGTGC 


GGTTANTAGT 


GGCAGTGGTG 


TCTTTTGGTG 


TAAATGCTGC 


TCCAACTATT 


60 


CCACAGGGGC 


AGGGTAAAGT 


AACTTTTAAC 


GGAACTGTTG 


TTGATGCTCC 


ATGCAGCATT 


120 


TCTCAGAAAT 


CAGCTGATCA 


GTCTATTGAT 


TTTGGACAGC 


TTTCAAAAAG 


CTTCCTTGAG 


180 


GCAGGAGGTG 


TATCCAAACC 


AATGGACTTA 


GATATTGAAT 


TGGTTAATTG 


TGATATTACT 


240 


GCCTTTAAAG 


GTGGTAATGG 


CGCCAAAAAA 


GGGACTGTTA 


AGCTGGCTTT 


TACTGGCCCG 


300 


ATAGTTAATG 


GACATTCTGA 


TGAGCTAGAT 


ACAAATGGTG 


GTACGGGCAC 


AGCTATCGTA 


360 


NTTCAGGGGG 


CAGGTAAAAA 


CGTTGTCTTC 


GATGGCTCCG 


AAGTGATGCT 


AATACCCTGA 


420 


AAGATGGTGA 


AAACGTGCTG 


CATTATACTG 


CTGTTGTTAA 


GAAGTCGTCA 


GCCGTTGGTG 


480 


CCGCTGTTAC 


TGAAGGTGCC 


TTCTCAGCAG 


TTGCGAATTT 


CAACCTGACT 


TATCAGTAAT 


540 


ACTGATAATC 


C GGTCGGTAA 


ACAGCGGAAA 


TATTCCGCTG 


TTTATTTCTC 


AGGGTATTTA 


600 


TCATGAGACT 


GCGATTCTCT 


GTTCCACTTT 


TCTTTTTTGG 


CTGTGTGTTT 


GTTCATGGTG 


660 


TTTTTGCCGG 


TCCGTTTCCT 


CCGCCCGGCA 


TGTCCCTTCC 


TGAATACTGG 


GGAGAAGAGC 


720 
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ACGTATGGTG GGACG 3CAGG CCTGCTTTTC ATGGTGAGGT TGTCAGACCT GCCTGTACTC 780 
TGGCGATGGA AGACG2CTGG CAGATTATTG ATATGGGGGA ATACCCC 827 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: Linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

CCAGGGGCCC AAAATCCGTG TATCCACCTT TAAAGAAGGC AAAGTTTTCC TCAATATTGG 60 

GGATAAATTC CTGCTCGACG CCAACCTGGG TAAAGGTGAA GGCGACAAAG AAAAAGTCGG 120 

TATCGACTAC AAAGGCCTGC CTGCTGACGT CGTGCCTGGT GACATCCTGC TGCTGGACGA 180 

TGGTCGCGTC CAGTTAAAAG TACTGGAAGT TCAGGGCATG AAAGTGTTCA CCGAAGTNAC 24 0 

CGTCGGTGGT CCCCTCTCCA ACAATAAAGG TATCAACAAA CTTGGCGGCG GTTTGTCGGC 300 

TGAAGCGCTG ACCGAAAAAG ACAAAGCAGA CATTAAGACT GCGGCGTTGA TTGGCGTAGA 360 

TTANCTGGCT GTCTCCTTCC CACNCTGTGG CGAAGATNTG 4 00 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 578 base pairs 

( B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CCGATTTTTT GCGAAACGTT CCGCCTGGCA TCAGGATAGT TTGTTCGTTA TCCAGTTCGG 60 

ATAGCGCATT GACGATATGC AGGCTGTTGG TCATCACCGT GATGTNATTA AAGCGCGAGA 120 

GCAGGGGAAC CATCTGCAAA ACGGTACTGC CAGCATCAAG AATGATCGAA TCGCCATCAT 180 

GGATAAAACT AACGGCAGCT TCTGCAATCA GCTCTTTCTT GTGGGTGTTG ATGAGTGTTT 240 

TATGATCGAT AGGCGGATCG GATTCCTCTT TATTCAACAC CACTCCGCCA TAAGTACGAA 3 00 

TGACGGTTCC GGCATGTTCC AGAATGACCA GATCTTTGCG AATGGKTGTG CCTGTGGTGT 3 60 

CAAATATTGC GCCATTCTTC AACCGAGCAT TTACCCTGCT TTGCAGATAC TCCAGAATGG 4 20 

CGGCCTGACG CTGACGAGTT TCATGGGCGT GATACCTGAT TTAGGTTCAA ATGATAACTC 4 80 

GCAAGCAGTA ACATCACACG NAATATCCAC GTTCAGTTAA GCGCCATGAT AGAGCATCCG 54 0 

TGATAGGGNC AGGGGNAGTC ACACGGCGTA ATCACCGC 57 8 
(2) INFORMATION FOR SEQ ID NO: 39: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 399 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

TGTTAGGTCA GGGCCCACAG TCAAGCTTAG GTTTTACTGA ATATACCTCA AATGTTAACA 60 

GTGCASATGC AGCAAGCAGA CGACACTTTC TGGTAGTTAT AAAAGTGCRC GTAAAATATA 12 0 

TCACCAATAA TAATGTTTCA TATGTTAATC ATTGGGCAAT TCCTGATGAA GCCCCGGTTG 180 

AAGTACTGGC TGTGGTTGAC AGGMGATTTA ATTTTCCTGA GCCATCAACG CCTCCTGATA 24 0 

TATCAACCAT ACGTAAATTG TTATCTCTAC GATATTTTAA AGAAAGTATC GAAAGCACCT 300 

CCAAATCTAA CTTTCAGAAA TTAAGTCGCG GTAAATATTG GATGTGCTTA AAGGACGGGG 3 60 

AAGATTTCAT CGACACGTCN GCGTGCAATC TATCCGTAT 3 99 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 327 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

( D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

CAGCCTCCGT TACCGGACAG CAAGGAGGCT GAATGGAGTT TACAGGATTT GCTTTTTTAT 60 

AATGTCTGGC CATGCAGTMA AACCGGACAG GTTTTATTAT CATGTGAGGT ATTCTGACAT 120 

AAAATGCTGG ATTTTTATTT TGTGACGAAT GCTGCAAAAT TGCATCTGCA CTCTGATGTA 180 

GCTTTTATCT GTTTCAGTGA AGCATGCCCA CAAACTGAGT TATTAAGTTG TGGAAGAACA 24 0 

GTTTTGTCCC GCCTGCATAT CTCCTTTCAA AAACCAGTAT GTCGCCATGC CTCGCCTTAA 300 

TGGAGAGCGC TGAACCATAC CTTCTTT 32 7 



(2) INFORMATION FOR SEQ ID NO: 41: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 314 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

( D ) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
GGAGATGGGC ATGGAACTCA CTTCATAATA ATGCCTACCG AAGAAATATT AATAGATGAC 60 
ATTTCCACGA GNGATAGCAA TAAAACATCA GAGCAGTCTT CTCGCTTAGA AAAAGCTTTA 120 
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TTAGGTTTTA CAAACACAAT G T AC AG T G A T TCAAACCCTC CTATTATAGC TCGTTTTAGA 180 

GACTATCTGG AAGATGGTGA GTGCATTGAC AGAATTAGCG AATCAATTTT TTTTACACCG 2^3 0 

CAAGAATTCA ATCTTGCAGA TCACCACATT GAAGGATGGT TCAATGAATT TGGTCAATTC 300 

AGTGGAACTG TTTC 314 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 590 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

TCCCAAGATC TTTTTGGCCG CAAATCCACA AAACCCGTCG TTANTGTCGC GCAGCCANTT 60 

GCAGGCCGAA TTTGCACCGT TTTAGAAAGC GGCGTTTTGT AGAGCAGCAC GCAGTGAGAA 120 

GCCACCGCGC CACGACCTAC GNGCNCGCGC AGCTGGTGTA ATTGCGCCAG ACCCAGACGC 180 

TCCGGGTTTT CGATAATCAT CAGACTGGCG TTAGGCACAT CAACGCCGAC TTCAATAACG 24 0 

GTTGTGGCAA CCAGCAGGTG TAGCTCACCT TGTTTAAACG ACGCCATCAC CGCCTGTTTC 300 

TCGGCAGGTT TCATCCGCCC GTGTACCAGG CCAACGTTCA ACTCTGGTAG CGCCAGTTTC 360 

AACTCTTCCC AGGTAGTTCC GMCGCCTGCG CTTCCAGCAA TTCCGACTCT TCAATCAACG 4 20 

TACAAACCCA GTATGCCTGA CGACCTTCAG TTATGCAGGC GTGGTGCACC GGGTGCAATG 4 80 

GATGTCGGTA NNGCGGGTAT CAGGAATAGC GACCGTAGTC ACTGGGCGTG CGGCCTGGGC 540 

GGCACTCCAT CTATCACCGA GGGTATCGAG ATCGGGCATA CGCNTGCATT 5 90 



(2) INFORMATION FOR SEQ I D NO : 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

GACGAAAGGG CCTCGTGATA CGCCTATTTT TATAGGTTAA TGTCATGATA ATAATGGTTT 60 

CTTAGACGTC AGGTGGCACT TTTCGGGGAA ATGTGCGCGG AACCCCTATT TGTTTATTTT 120 

TCTAAATACA TTCAAATATG TATCCGCTCA TGAGACAATA ACCCTGGATA AATGCTTCAA 180 

TAATATTGAA AAAGGAAGAG TATGAGTATT CAACATTTCC GTGTCGCCCT TATTCCCTTT 24 0 

TTTGCGGCAT TTTGCCTTGC CTGTTTTTGC TCACCCAGAA ACGCTGGTGA AAGTAAAAGA 300 

TGCTGAAGAT CAGTTGGGTG CACGAGTGGG TTACATCGAA CTGGGATCTG CAACAGCGGT 3 60 
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AAGATCCTTG AGAGTTTTTC GCCCCGAAGG AACGTTTTTC 4 00 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

(B) TYPE : nucleic acid 
(CJ STRANDEDNESS : double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

ATTCGGAAAG ATGCTTCTAN TTTTTTTAAG CACGTATAAA CTGTTAATTC AGGTTCAATG 60 

CTACGAAATG CACTAGTTAT AACCTGTATT GAAGGAAAGA TCTTCTGATA CTCTTTCCAG 12 0 

AGATCTTCAA GTCTGGCCAT GGAAATTGAC TTGGCTGCAT ATTCTAGGTC AGTGTTTATG 180 

ATAGTTTCTC TATTCTCTCT GAATGCGGAA AAAAAAGCTT CATTCAACAA TGATAGTAAA 24 0 

TCCCTGGGCC GGTAAAGGGT AAATTGCAAA CATCGCTTAA AACCATTCCT CCCTTTAAGA 300 

TCATCCGCTG TGCATCTATC CCAAACTCGT TGATCTTTCT CAATATCTAG CTTAAATGCT 360 

ACTTTCATTC TTTTAGCTGA CAGCATTAGG AGTTGTGCCC 4 00 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 585 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

TAATGTTGAA GACAGAGATA TAATNTACAG CATCATCCCA CAAGGCAGAT ATAACAATAC 60 

TTGACTGGGA TATGCAAAGC GATAGTGGGC AATTTGCTAT TGAAATAATA AAATCGATAA 12 0 

TCGTTTCAGA TATAAATTCT GGAGGACGTT TACGTCTTCT TTCTATTTAT ACTGGTGNAC 180 

ATGTTACTGC TGTTATAACT AAGTTGAACA ATGAGTTAAA GAAAACATAC CGTAGCGTAA 24 0 

TAAAAAATGA TGATAGTATT TTTATTGAAG ATAACTATGC ACTCGAACAA TGGTGTATAG 300 

TTGTTATTAG TAAAGACGTT TATGAAAAAG ATCTTCCAAA TGTGTTAATA AAAAAATTCA 360 

CTAACCTTAC AGCTGGGTTG CTATCCAACG CCGCACTCTC TTGCATTTCT GAAATAAGAG 4 20 

AWAAAACCCA TGGGATATTA ACAAAATATA ATAATAAATT AGACACTGCA TATGTTTCCC 4 80 

ACATCTTAAA TTTAATAAAA TCCAAGGRGT CAAGGGCATA TGCTTATGAA AATGCTCATG 54 0 

ATTATGCAGT AGATTTAATT TCTGAAGAAA TAAGATCAAT ATTGC 58 5 
(2) INFORMATION FOP SEQ ID NO: 4 6: 
(i) SEQUENCE CHARACTERISTICS : 
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( A) LENGTH: 390 base pairs 
{ B ) TYPE: nucleic acid 
(C) STRANDEDNESS: double 
{ D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 46: 

ANTCATCCAA CTGGCCGATC AGCAAAAAAG CGCGGCCTAC GATTTCACCC ACGAACTGTT 60 

AACCACGCTG GAAGTTGACG ATCCGGCGAT GGTAGCAAAG CAGATGGAAC TGGTGCTGGA 120 

AGGCTGTTTA AGCCGAATGC TGGTGAATCG TAGCCAGGCG GATGTCGACA CCGCACATCG 180 

GCTGGCGGAA GATANTCNTT GCGTTCGCCC GCTGCCGTCA GGGTGGTGCA CTGACCTGAC 24 0 

AGAAACACAG AAAAGAAGCG ATTTGCCGCA ATCTTAAGCA GTTGAATCGC TTTTACTGAA 300 

ATTAGGTTGA CGAGATGTGC AGATTACGGT TTAATGCGCC CCGTTGCCCG GATAGCTCAG 3 60 

TCGTAGAGCA GGGGATTGAA AATCCGTTGT 390 



(2) INFORMATION FOR SEQ ID NO: 47: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 473 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 47: 

GGATGCCAGT GTCAGCGACT GGTTAAAGTG GTCGATATCG ATGAGCAAAT TTACGCGCGC 60 

CTGCGCAATA ACAGTCGGGA AAAATTAGTC GGTGTAAGAA AGACGCCGCG TATTCCTGCC 120 

GTTCCGCTCA CGGAACTTAA CCGCGAGCAG AAGTGGCAGA TGATGTTGTC AAAGAGTATG 180 

CGTCGTTAAT TTTATCTCGT TGATACCGGG CGTCCTGCTT GCCAGATGCG ATGTTGTAGC 24 0 

ATCTTATCCA GCAACCAGGT CGCATCCGGC AAGATCACCG TTTAGGCGTC ACATCCGTCG 300 

TCCCCTGGCA AACGGGGGCG ATTTTCCTCC ATTTGCCTCA GTGGCTGGCG TTTCATGTAA 3 60 

CGATACATGA CAGCGCCCGA CAAGATCCTG ATACTCTTTG GGTATTCAAC CGTTTCCAGT 4 20 

GTAATTCGTC GTTCACNAAC ATTGGCGTTA CAGGCGGGGC TGGCNGTNAC CCA 4 73 



(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 482 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
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GAAGTGACGG ATGGCTGTGG TTTCTCCATC GGTCACCAGC AGCAGTTNGC ATCATGGATT 60 

GCCTATAAAG TCGCGCCGTT CCTCGGNAAA AAAGAGGAGA GCCTTGAAGA CCTCAAATTG 120 

CCGGGCTGGC TGAACATTTT CCACGACAAC ATCGTCTCCA CGCGATTGTG ATGACCATCT 180 

TCTTTGGTGC CATTCTGCTC TCTTC jGTAT CGACACCGTG CAGCGATGGC AGGCAAAGTG 24 0 

CACTGGACGG TGTACATCCT GCAAACTGGT TCTCCTTTGC GGTGGCGATC TTCATCATCA 300 

CGCAGGGTGT GCGCATGTTT GTGGGGGAAC TCTCTGAAGC ATTTAACGGC ATTTCCCAGC 3 60 

GCCTGATCCC AGGTGGGGTT CTGGCGATTG ACTGTGCAGC TATCTATAGT TCGGGCCGAA 4 20 

CGCCGTGGTC TGGGGGTTTA TGTGGGGCAC CATGGGTCAG CTGATTGCGG TTGGCATCCT 4 80 

AG 482 



(2) INFORMATION FOR SEQ ID NO: 4 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 185 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

GACGACCTGC AGGCATGCAA GCTTGGCACT GGCCGTCGTT TTACAACGTC GTGACTGGGA 60 

AAACCCTGGC GTTACCCAAC TTAATCGSCT TGCAGCACAT CCCCCTTTCG CCAGCTGGCG 120 

TAATAGCGAA GAGGCCCGCA CCGATCGCCC TTCCCAACAG TTGCGCANCT GAATGGCGAA 180 

TGGCG 185 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 491 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
TAACGCTTCA ATACGCGCGA CCAGCTGGCG GCGCTCATAC GGCGTAATTT TGGCGTCGGC 60 

GAGCAAAATC CCTTGTTTAA AGGTATTTTG CCAGCTGCCG TCGTCATATT GGCGAGCTTG 120 

CTGACGCGAC TGCGCAGGCA TTAAACGATC AGCACAATCC ATCGCCCGCA GCCAGTAAAG 180 

CGGATTGGTT TCGGTTGATT TACCTTGCAG CGCCCAGATG TCGCTACATT C AG TAG AAA G 24 0 

ATAGTCAGCC AGTTGATAAA CCGGAATTTT TTCTTCTGCT GGCGTATCAA TGGCTGGCTT 300 

ATTGTGATTC TGCACGCAAC CCAGCAATGC CAGACATGGA GACCCTGCCA GCCACAGCCG 3 60 

TCGGGGCAAT AATCGTTGAA AAATGTGTCG CATATTCACC AGACTTAAAG CCTATCCCAG 4 20 
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TGGGCGTAAT TGTTGCAGAC AGTCTGGACA TGGACAGCGC GGAGAAACCG GNAGCGTACA 480 
TATCGTACGT G 491 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 106 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 
{ D } TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
ACTTGAACGG CAATTATTAT TTATCCATGC AACTTCAAGT TGCAGTATCG GAACATTAAC 60 
TTTTCTGGGG TGAATATCAC TCTGATATCG TTTTTTGTAT GCGTNT 106 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 481 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

TTTATGTGCG GTATTGATGG CTGAAGCCTG TAATATCGGA CTGGAACCGC TGATAAAGCA 60 

CAATATACCA GCACTGACCC GCCATCGGCT CAGTTGGGTG AAACAGAATT ACCTTCGTGC 120 

AGAAACGCTG GTCAGCGCCA ATGCCCGCCT GGTTGATTTT CAGTCCACAC TGGAGCTTGC 180 

TGGTCGTTGG GGAGGTGGAG AAGTGGCATC AGCTGACGGC ATGCGCTTTG TCACACCAGT 24 0 

GAAGACCATC AACTCAGGAT CTAACAGAAA ATATTTTGGT TCTGGGACGA GGCATCACCT 300 

GGTATAACTT CGTATCTGGA TCAGTACTCT GGGTTCCATG GCATTGTGGT ACCCGGTACA 3 60 

TTACGGGRCT CGATTTTGTA CTGGAAGGAC TTCTTGAGCA GCAGACAGGG CTGAATCCAG 4 20 

TTGAAATCAT GACAGACANT GCGGGTAGCA GCGATATTAT TTTCGGTCTG TTCTGGCTAC 4 80 

T 481 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 558 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
TGGNCCGTAA TTCCCAACCA TTTGCCGAGG TCCAGNTTTT TCACCATGTT ACTCGGGATA 60 
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GCCAAAACNG ATACCGATGT TGCCGCCGTC CCGGTGCGAG GATCGCGGTG TTGATACCGA 120 

TCAGTTCGCC GTTCAGGTTA ACCAGCGCAC CACCGGAGTT ACCACGGTTG ATCGCTGCAT 180 

CGGTCTGGAT GAAGTTTTCG TAGTTTTCGG CATTCAGGCC GTACGCCCCA GCGCAGAGAC 2-10 

AATCCCGGAA GTTACCGTCT CGCCCAGACC AAACGGGTTA CCAATCGCTA CGGTGTAATG 3i»0 

ACCCACGCGC AGTGCATCAG AATCCGCCAT CTTAATTGCG GTCAGGTTTT TCGGGTTCTG 360 

GATTTGGATC AGGGCGATAT CAGAGCGCGG ATCTTTGCCA ACCATCTTCG CGTCGAACTT 4 20 

ACGGCCATCG CTCAGTTGAA CTTTAATGAG CGTCGNGTTA TNAACAACGT GGTTGTTGGT 4 80 

GACGAGATAG CCTTTATGGG CATCAATGAT GACGCCGGAA CCCAGCGCCA TGAATTCTGT 54 0 



TG2T3GCCGC C AC CAT T A 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 263 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDtJESS : double 

(D) TOPOLOGY: linear 



(xx ) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

CACCTGCGTG ACGTGACCGA CCTTTTCTCC TCGCTGNTTG TTTCCCCTAT CGTCGGCCTG 60 

GTCATTGCGG GAGGCCTGAT ATTCCTGCTG CGACGCTACT GGCGCGGGAC GAAAAAAGCG 120 

TGACCGTATT CGCCGCATTC CGGAAGATCG CAAAAAGAAA AAACGGCAAA CGTCAACCGN 180 

CATTCTGGAC GCGTATTGCG CTGATTGTTT CCGCTGCGGG CGTGGCGTTT TCGCACGGCG 240 

CGAACGACGG ACCAAAAGGG ATC 263 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 683 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

GTAACGCGTC TGGAAGATGG CCTGCCAGTG GGCGTCGTCG ATGTGGTCGA GGGGCTGGAC 60 

GGTTGCCATT CCGCCAATAT CTCACCGGAC AACCGTACGC TGTGGGTTCC GGCATTAAAG 120 

CAGGATCGCA TTTGCCTGTT TACGGTCAGC GATGATGGTC ATCTCGTGGC GCAGGACCCT 180 

G 3GGAAGT 3 A CCACCGTTGA AGGGGCCGGC CCGCGTCATA TGGTATTCCA TCCAAACGAA 24 0 

CAATATG-GT ATTGCGTCAA TGAGTTAAAC AGCTCAGTGG ATGTCTGGGA ACTGAAAGAT 300 

CCGCACGGTA ATAATCGAAT GTGTC^AGAC GCTGGATATG ATGCCGGAAA ATTCTCCGAC 360 
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ACCCGTTGGG CG GO KG AT AT TCATATCACC CCGGATGGTC GCCATTTATA CGCCTGCGAC 4 20 

CGTACCGCCA GCCTGATTAC CGTTTTCAGC GTTTCGGAAG ATGGCAGCGT GTTGAGTAAA 4 80 

GAAGGCTTCC AGCCAACGGA AACCCAGCCG CGCGGCNTCA ATGTTGATCA CAGCGGCAAG 54 0 

TATCTGATTG CCGCCGGGCA AAAATCTCAC CACATCTCGG TATACGAAAT TGTTGGCGAN 600 

CAGGGGCTAC TGCATGAAAA AGGCCGCTAT GCGGTCGGGG AGGGACCAAT GTGGGTGGTG 6 60 

GTTAACGCAC ACTAACCGCT GAT 68 3 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 282 base pairs 
{ B } TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

TGGATGCAGG GAAAAACATT GATATTACCG GGGCAACGTG CTCGTCCGGT GGAGACCTTG 60 

GAATGTCTGC GGGTAATRAC ATCAACATTG CCGTAAACCT GATAAGCGGG ACAAAAGTCA 12 0 

GTCCGGTTTC TGGCACACTG ATGACAACAG TTCATCATCC ACCACCTCAC AGGGCAGCAG 180 

CATCAGCGCC GGCGATAACC TGGGCGATGG CTGCAGGCAG AGATKCTGGG NTGTCACAGC 24 0 

ATCCTCTGTT TCTGCCGGGC ACAGCGCCCT GCTTTCTGCA GT 282 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 697 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

( D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 

ATGAACGGCC CCCCCCACAG CCCGTTAACA AACGGNTGCC CCGGCGATAA TCGTACTGAT 60 

AAGTTAACTC CAGCAGGCGG TTAATTGAAA GCGAACGGGA GGCTGATGCA TGGTAATAAT 120 

CCCTTAAAAC GCGACGGCAA CGCGCCAGTA AACCGTGAGA TGGTCAGGGG CAAGCCAGTC 180 

CGGGTAAACC AGAGGCAGTC CGGCAGTGAA CGAACCGGAA ACATGACCAC TGGTGGTGCT 24 0 

GAGCCCGGCA GCAGCACCCC ACAGCGTGCC GGACGAGTAC GGGTCATCTC TGTCAGAGTG 300 

CAGCCAGCCG CCGTCCAGTG CAGTCACTGC ACGGACTGTC CCCACATATG GCAGGGAGAA 3 60 

CAGAGACCAG GACAGCTCAT TTCGCAGATA ACCGCCGTTA TTACCGGAGA TATACTGCTC 4 20 

CTTAAAGCCA CGCACTGAAC TCTCACCCCC GAGGCTCAGT TGTTCCACAC CATGAAGACG 4 80 

GTCCGGTGAC CACTGGGCAT AAGCGCTGGT CAGCCACCAC ACCCTGTCCG TGACGGGGCG 54 0 
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CTGAAAACTG GCACTCACCG ACCATTTGCG GAAGTGATTT ACGGGCAGGT CTCCCCTTTT 600 

CCCGTGGTCG CTTTCTGCGC CGAACCAGGG CATCCCCCGT GTGAATACCG GATTCAGTGT 6 CO 

TCCGACACCA CCCAGAAACT TGTGTGTGTG ATTCANC 6/>7 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4835 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

TTCGACTGAG CACCACAAAT ACTGGGTATC TCCCCAGATA GTTCATTGCG GTACAAGCAA 60 

TATAGGTGCA GAAAGTCAAC CTGCTGCACC CTATTGGATA ATTATATATG GCCTTCAATA 120 

AAGTTTGCGG TTGTCGACGT TGGCTATATC AGCCATTTCC AATGCATAGT TCTTTGGTTT 3 80 

AGCACCATCA AGTTATAGAT TTGGGAATAG TTTCAACTGG TATTGATTGA ATTGGGTTTC 240 

ATCGTCGATG ATTAATACTA TTTGTAAAGA CTTTATTGTT GATTTCTTAT TATACCACAA 300 

ACCCAAACTG GTCTAGGTCA TCATTTGGTG TTGATAACGG GCTCTGATAA TTTCTGCTCT 360 

TCTGCTATAC TGGGGATTAT GAAGAATATT AAGGCTGAGT GTATTGAGGT AGTGTTCTTT A 20 

GAACCGACCA TTCATGACAA TATATTCTTC AA.TTCGTGAG TGATCCAGCA ACTGGTTGAA 4 80 

TTTAAAACAC TGAGTGATGT TATCCTCTGT AATCGTATGG TTGCTGAACT AGTTGATGTA 54 0 

GCCGATAAGG TTTATACCAG ATATCTTTTG GGGGGATTAG ATAACGTAGC CGCGGATAGC 600 

AAACGAGATA GTTGAATTTT ATTACCGTAA TTTCTTCCAT TGAGAAAAGC TTATTTTTCT 660 

TGGTGGTATT CGCAGTTATG TATCTTCCAT AAAGACTTGG GAATATCTTG CTTGAAARGC 7 20 

TATCTGGAGA TAGCCTTAGT TATTTGATAA ATATTTCAAA TAGGAGGAGC CGTATGGCTG 780 

TCATTTATAC CCTCACTAAA TCGTCACTTG TCAAGTCTGG TGGTCAATTA CATTGGAATA 84 0 

TTGATTCGCC ATCAGAACAA CAGCCACAAA AGATCGTCAA TGGTCGGGTT GCGCTTCGGG 900 

GATGGTTACT GGCAGATGTG GAAAAAGATC TCCGTGTTGC GGTTAAAATT GAACATTTGA 0 60 

CATACAGTTT TCCCTTCAAT ATAAAGCGCC CTGATGTTAT TTCAGCTATA CTGAAACAGC 102 0 

CACCTGAAAA ACATCAAAGA CTTCATTGTG GATTTGATAT CAATGTCCCA TTTTCTACTA 108 0 

AAATAATTAT TGGCCTTGAG TCTGATGGGT TGATTACCTG GTTGGAAGAG TTATTATTTC 114 0 

TCCTGCCTGA TAATTGAATT AAGTATCTAT ACCGATAGTA TCGCGATAGA TATATTTTTT 1200 

TACAGGATGA TAATTTGAGA ATCTATATAG CCGCTATTAT CAAGGATGAG TATTCAAGTT 12 60 

TACTTGAATG GATTGCCTAC CATCGAGTAT TAGGTGTTGA TGGGTTTAKT ATTGCAGATA 1320 

ATGGCAGTOG TGAWGGTAGC CGAGAATTAC TATTTTCCCT CGCTCGCCTA G3TATTGTGA 13:^0 
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CGATGTTCGA A C AA C C G A 3 T TTGGTGAATC AAAA G C C A C A AT TACCTGCA TATGAACATA 144 0 

TTTTACGTAG CTGTCCCAGA GACATAGACC TGCTTGCATT TATAGATGCT GATGAATTTT 1500 

TATTGCCACT TGAATCGGAT ACCAATTTGT CAGATTTTTT TTCTGAAAAG TTTCAGGATG 1560 

AGAGTGTCAG CGCTATTGC'A TTGAATTGGG CAAATTTTGG TTCTAGTGGT GAATGGTTTG 1620 

CTGAAGAGGG GTTGGTTATT GAACGTTTTA CCTATCGTGC CCCGCAATCC TTTAACGTTC 168 0 

ATCATAACTT CAAAAGCGTG GTCAAACGCG AAGGAGTTAA CCGCTTTCAT AATCCGCATT 174 0 

ATGCTGATTT GCGTTATGGT CGATATATGG ATGCATTGGG TCGTGATTTG ATTCTGCACC 1800 

CGAGGCATG G TAATGGGGTT AGTGCTGAAG TGACTTGGAG CGGTGTCAGG GTAAATCACT I8 60 

ATGCAGTTAA ATCACTTGAG GAATTCTTGT TGGGCAAGCA TCTGCGTGGT AGTGCTGCCA 192 0 

CTGCTAATCG AGTAAAGCAT AAAGATTATT TCAAGGCACA TGATCGTAAT GATGAAGAGT 198 0 

GCCTTCTCGC TGGCGCATTC TCAGAACAAG TAAAAGCTGA AATGGAACGA TTAAGTGTGA 204 0 

AGTTGACTGA GTTACGAGCA GTTGAACCTA TTCCTACTGG TTCTTGGTTC AAAAAAAAAA 2100 

TGAAGAAATG GATGGTTTGA ATATATTGAG CAAGCAGTTT GGTATTTATT TCTGCTCTTA 2160 

TCTACAGGTC TGCTAATAAG GATCTGTATC CCCCAGGTGT TACCTTGGAC TGTAAGTTAT 2220 

ATTATGTGTA GCTATTGCGA TTGGCAGCCT CTGACATTGC GAGACTCGTT TTCTCTTCAT 2280 

TCTGGTTGGC TTCTGATTCG GGGGCGCGTG TTGACGACTC AAACTCGAGG TGAAACTCGT 2 34 0 

CTGCGCTGGC AATGCGGACA AGGAATATGG CATGAACAGA AGTTGCCGGT CACTGGTCGA 24 00 

GGCACGTTGC TGGAGCTGGT TTATCTACCY TCGGGAGCTA GTCATTKGTG TTTGCTGGCA 24 60 

AGTAATAAGG GCGCTGAGTG TAATGTTGAA ATTACTCAGC TTTGTTGTGT ATCCCGTGCC 2520 

GAGAGTCTCT GGCGTCGATT GCGCCGGGTT GTACCTTTTT ACCGACGCTT AACGAAGTCC 2580 

AGACGCAAAA GGTTAGGCCT TTCATGGCAT TTGTGGCTCA CGGACTTGCA GCAAGCTTAC 2 64 0 

CAACTTGTCA GCAGAGTTCG CGATGATAAA CCACTCAATA GCTATGATGA GTGGCTAGCA 2700 

GACTTCGACA CCCTTGAACC CGCCGAATAC AAGCTGATTA AGCGCCAGCT GGCTCGCTGG 2 7 CO 

GGCACATTAC CACGTTTCTG TTTGCATCTT GTTGGCGTTG GGGATGAACA GAGCCGCCAC 2 82 0 

AAGACCCTGG AGAGTATTCA GGCAGTCTGT TATCCGGCAA GCAATATAAA CCTGCAGGAG 2 880 

CATGGTGCAT ATCCAGAAAT CTCCAGTCAG TCAAGCGGCG AATGGCAGTG GGTGTTGCCT 2 94 0 

GTAGGGGCAG TGGTTTCGCC AAGCGCCTTA TTTTGGGTTG CCCACCAGTT ACGCCAGAAT 3000 

CCTGATTGTT TATGGATATA CGGTGATCAC GATCTGCTTG ACGAGAGAGG TGAACGTCAC 3060 

TCTCCCAACT TCAAACCTGA TTGGAATGAA ACGCTGCTAC AGAGCCAAAA CTATATTAGT 312 0 

TGGTGTGGTT TGTGGCGTGA ACAAGGTGCT GGCCGTGTTC CCTTTGATGG GGCGACATGC 3180 

CATCAGTGGT GGCTACAGTT GGCAAAGATG TGTGAACCGA AACAGATAGT CCATATTCCA 32 4 0 

TCATTGATGA TGCATTTGCC TGCAAGAGCG TTGATTTCGG ATGATTTTGA GTCGCTGAAA 3300 
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GATAAAGAAG 


ATTTACTGCC 


ATCAGGAGTG 


AGCATTGAGG 


CAGCACCTCA 


TGGTGTATGT 


3360 


CGTTGGCGCT 


GGCCGTTCCC 


AGCGCAATTG 


GCATTGGTTT 


CAGTGATTAT 


CCGTACTAGA 


3420 


AATGGTATTG 


CTCATTTACG 


CCCTTGTATC 


GAAAGCCTGA 


TACAAAAGAC 


GGAATATGCC 


3480 


AATATGGAAG 


TCATAGTGAT 


GGATAATCAG 


AGCGATGAGG 


AGGAGACGGT 


TGCTTATCTT 


3540 


GCTCATATCG 


AACAGGTTTA 


TGGCGTTAGG 


GTGATTTCTT 


ATGATCAACC 


GTTTAACTAT 


3600 


TCAGCCATCA 


ACAATCTGGC 


AGTGAGAAAC 


GCACATGGAG 


ATATGATATG 


TTTGCTGAAT 


3660 


AATGATACTC 


AGGTAATCAG 


TATTGACTGG 


CTGGATGAAA 


TGGTTTCTGA 


TTTATTACGC 


3720 


CCCGGCGTGG 


GTGTGGTAGG 


AGCAAAGCTG 


TATTACGGAA 


ATGGCTTGAT 


TCAGCATGCA 


3780 


GGCGATGCTG 


TCGGCCCTGG 


CGGTTGTGCA 


GATCATTTTC 


ATAATGGTTT 


GTCAGCTAAC 


3840 


GATCCTGGAT 


ATCAGCGTAG 


GGCTGTTAGT 


GCCCAAGAGC 


TGTCAGCTGT 


GACTGCAGCT 


3900 


TGTTTATTGA 


CTCATAAAGA 


GTTA/TATCTG 


GCGCTCGGAG 


GACTTGATGA 


AACGAATTTG 


3960 


CCGATAGCTT 


TTAATGACGT 


RGATTATTGT 


CTCAGAGTTC 


GAGATGCTGG 


CTGGAGAGTA 


4020 


ATCTGGACTC 


CCTTCGCTGA 


ATTGTATCAT 


CATGAGTCTA 


TTTCCCGTGG 


TAAAGATGTA 


4080 


TCAAAACAAC 


AGCAGATACG 


AG CG AAATCT 


GAGTTGCGCT 


ATATGAAAAA 


ACGATGGGCA 


4140 


TGTGCACTTA 


AACACGATCC 


AGCCTACAAC 


CAAAATTTGA 


GTTATGAACG 


TCCTGATTTC 


4200 


TCTTTAAGTA 


GAGCTCCTAA 


TATAGTATTG 


CCATGGATGA 


ATTAATTCGC 


AGGAAACTAT 


4260 


TTAAGCCTTA 


TCGTAAATTA 


AATAAACAGA 


GTTATAGAAG 


TCGGCAAAGC 


TCTGAGATTA 


4320 


ACTTTGAACG 


ATTGTTTATA 


TTACATGAGG 


GAAAATCACC 


TACATTAGCC 


TATTTTGAAT 


4380 


CGGCTATTAT 


AAGTCGGTTT 


GCTGATGCAG 


AATGTCATTT 


TATCGACACA 


TTAGCATCCA 


4440 


CTGATATATT 


TATTCCTAGA 


GGATCTGCCC 


TTGTCGTCAT 


TAGATTCATC 


TCCCCAAAAT 


4500 


GGCAACAGCA 


CATAGAAAGA 


TATAACGACA 


GGTTTTCTCG 


AATTGTTTAT 


TTTATGGATG 


4560 


ACGACCTGTT 


TGACCCGACT 


GCACTATCTA 


CGTTACCAAA 


AGAGTATCGT 


ACCAAGATAA 


4620 


TAAGGAGGTC 


GGCGGCTGAG 


CATCGATGGA 


TTACGCAATA 


TTGTGATAAC 


ATTTGGGTTT 


4680 


CAACTGCCTA 


TTTGGCTAAT 


AAATATGCAC 


ATCTTAACCC 


GGAGATTGTT 


TCTGCTAAAC 


4740 


CGTCACTGGC 


ACTCATTGAA 


ACACATCGAT 


CAGTAAAAAT 


CGCTTATCAT 


GGCTCAAGTT 


4800 


CTCATCGGGA 


AGAAAAATAT 


TGGTTGAGAC 


AAATC 






4835 



(2) INFORMATION FOR SEQ I D NO : 59: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1746 base pairs 

(B) TYPE: nucleic acid 
(3) STRANDEDNESS : double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 



NSDOCID <WO. .... 9822575A2 I > 



WO 98/22575 PCT7US97/21347 

-150- 

GAAAAATGNC ATAACCGCAT TCCATCAAGC CCGTNAATAT 2CCGGACTTT CATTTATTTC 60 

TGAGGCGTAC AGGGAAGCAA TAACTGCTGG TCAGATATTG CTGTCTCCSG TACATTTACG 120 

TGACACTGTA TTTTTGCATG CCAGTTTACC GACAGGGTTT CCCCCGGCGT CACGCCACTC 18 0 

AGCCAGGCAA GGCCTTCGTC GGCCACCATG CCCAGTTCCC GGCCTTTTTC ACTGGTTACA 24 0 

CTGGCACCAA AGGGGGGCTG AGAGCCATCA GCAAGACGCA GTATTGCAAA CAGACGTTTC 300 

CCTTTAAGCA CGCTGAATTT CCGGTAACCA ATGGGACCTT CTGTCAGCGC CGATTCCACA 360 

ACAGAACGGG TTGCTTCCAC ATCATGCGGT AAGCGCTTCA GGTCAACAGA GGTTGTATTC 4 20 

CGGTAATAAC TGCTGATGTC AGTCACCACG CCCGTTCCCC AGGGATTTGT CACCACCTGC 4 30 

CCGCCATCAA CCGGTACACC TCCCACACCA TCCGTGTCAA CAAGAAGACG TGTTCCACCG 54 0 

GACATTCCCC CTGCATGTAA CGCCGCACCT TTTGGGGTAA TTGTTGCCCC ACCGGAAGCA 600 

GTGACGCCGA AAGACGTATA TCCTTTCTGC AGGGATGCAA TATTCGCGGA CAAATTTGCC 660 

AGCGGACTAC GATGAGTGTA ATAGGCATTA ATCTGACGTT GCGATGTCAG TCCACCGCCA 7 20 

CTGTTAAGGG CGGCGTTCAG GCTGTAGCTG TCCAGACCGT CATTGAACGT GWCAGTGTAG 7 80 

CCGGCCATAT TCACATAACG GTCATTACTC ATACTGCCAC TGTAGCTCGC TGTCGCCGTC 84 0 

CGCCAGGGGG ACGGATATAG GCAGGTAAGC AGAATCNTTA TCACGCCCCA GATATTTAGA 900 

CCTTGAGGCT GACAATCCAA CCGCCACACC CTGGAGTCCG AAAACATTAA AGTAGCGGTT 9 60 

GACGCTCAGC GTATAATAGT CCGTTTTCCG TATGTGCCAG TATGTCTGAC GGCTGTACTG 1020 

CAGGTTAAAA GAGGTGTTCC AGTCCGCCAC GTTTTTATTC AGCGTAACGG TATACATCTC 108 0 

TTTTTCCCGA CTGCTGTAAT CATTACGGTA GCGGGCGTTC AGGTACTGCT CCATGGTCAT 114 0 

ATAGTTTCGC TCTGAGAAAC GATACCCGGC GAACGTAATG TCGGCATCCG CATTATCAAA 1200 

CCGTTTGGAG TAGCTCAGAC GCCAGGATTT TCCCTGAAAC GTTCTCTCTC CCTGAATACG 12 60 

GGCTACTGAC TGCGTGATAT CAGCGGAAAG GGTCCCCGGC ACACCCAGGT CCCAGCCGGC 1320 

ACCGGCTGGG AGTGCATTAT AATCACCGGC AAGCAGAGCC CCGCCATACA GCGACCACTG 138 0 

GTTACTGAGC CCCCAGGATG CCTCTCCGGT CGC AAATACA GGCCGTTCGG TCTCATGCCC 14 4 0 

GTATCCACGG GAACGACCGG AGAGAAGTTT GTACCGGACC TGTCCCGGAC GCGTCAGATA 1500 

AGGAACCGAG GCCGTATCGA CCTGAAAGTT TTCTTCCGTC CGTTCTGTTC AATAACCTCA 1560 

ACATCAAGAC GTCCGCGAAC TGAACTGTGC AGGTCCTGAA TACTGAATGG CCCTGCGGGG 162 0 

ACCATCGAGT CGTACAGCAC CCGTGCCTGC TGCGACACCA CAAGACGGGC ATTAGTCTCC 1680 

GCAATCCCGG TAATCTGCGG TGCATAAGCC TTCGCATTCT TGGGGCGGCA CATTCCGGGT 174 0 

CAGCGN 174 6 
(2) INFORMATION FOR SEQ ID NO: 60: 
(i) SEQUENCE CHARACTERISTICS: 
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-151- 

( A ) LENGTH: 723 base pairs 

(B) TYPE: nucleic acid 

(C) ST HANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 60: 



TGTACTGAGC 


ACGGCGAATA 


TCCAGTGTTC 


AAATTCCACT 


TTGCAGCGAC 


i GCATGA TG 1 


D U 


CTGCGGCGCG 


GTAACAATCA 


GGGCATTACT 


GTGTTTGCTG 


GCGGCGATGG 


AGACAACC i C 


1 Z U 


ACGCCCGCTA 


CCGACCGTGC 


CTTCCGCCTC 


TTCTTTAGCC 


GCCGTGAGCG 


TGCCGCTGAC 


18 0 


CTGCTTCAGC 


ACATCGACCA 


GATCTTCGGC 


TTTGCTGTAT 


TTGAGATAGA 


AAACCTGGCT 


240 


GTTGCCGCTG 


CGTTCCATTT 


CTGAGTCCAG 


CCGACGGATC 


AGGCGGCGCA 


TTTTGTCCCG 


300 


CGTGGCCGGG 


TCACCACTGA 


CAATCACACT 


GTTGGTGCGT 


TCGTCGGCGA 


CAATTTGAGA 


360 


TTTCAGCGTC 


GCAGGCTGGT 


TCTCGCCGCT 


GTTTTTAGTC 


AGGCTTTCCA 


GCACGCGGGC 


420 


GATTTCCGAA 


GCAGAGGCGT 


TATCCAGCGG 


GATCACCTCT 


TCAGTGCGAT 


TANCCGCGTG 


480 


ATCCACACGC 


TGGATCACTT 


CCGTCAGCCG 


CTCCACGACG 


GAGGCGCGCC 


CGGTGAGCAT 


540 


AATCACGTTG 


GAGGGATCGT 


AATTAACAAC 


GTTGCCTGAG 


CCTGCGCTGT 


CGATCATCTG 


600 


GCGCAGAATC 


GGTGCCAGTT 


CGCGTACCGA 


AACATNACGT 


ACCGGCACGA 


CTTTGGTGAC 


660 


CATTTCATCG 


CCCGCGTATT 


GTCGCTGCCT 


TCACCAACCA 


GCGGCAGGGC 


TCGACTTTCG 


720 


CGG 












723 



(2) INFORMATION FOR SEQ I D NO : 61: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2556 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

TAGAGGATCC CCGGCGTTGC GATCGTCACG AACATAGACC CACAKCCGTC CGGTAGGTAT 60 

TTACCCTGAC CCGGYTCCAG TACATTTACC GGCGTGTCAT CGGCATGCAC TTTACCCGGC 120 

ATCAGCACAT AGTGCTTCAG TTCATCATAC AGCGGGCGAA GCTGCTCTCC CATGATGTCA 180 

ACCCAGCGCC CCATCGTATT GCAGTGCAGC TCCACGCCCT GGCGGGCATA GATTTCCGAC 240 

TGACGGTACA GCGGCAGATG CTCGGCGAAC TTAGCCATGA TTATGCGGGC CAGCAGAGCC 300 

GGACTGGCGT AACTGCGCTC GATGGGTTTT GGTGGCTGCG GAGCCTGAAC TATACAGTCG 360 

CACCGGCTGC AGGCCAGTTT TGGGCGAACC GTTTCGATTA CCCTGAACGC GGTGTTGATG 4 20 

ATATCCAGTT GTTCAGAGAT GCTTTCTCCC AGCGGTTTCA GTTTGCCGCT GCAGACGGGG 4 80 
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-152- 

CATTCGGTTT CTGCCGGGGA GATAAC 2T 3C CTGTCACGGG GAA3TGTTGC CGG.AAGTGCT 54 0 

TTGCGGACGG GAGAGTCTGA TGTTTTCGGC GCTGTCTCTC CGGCCATTGA GGTGAGTTGC 600 

AACTGCGCCT CACCAAGCCT GTTCTGGAGC TCGGTTATAC GCGTTTCTGC CCGTGCGATC 660 

TTCTTTTCTA TCTTCTCGCG GCTTTTCTCG CTGCTGCGAC CGAACAACAT TCTCTGTAGT 72 0 

TTAGCGACCA GCGCTCTGAG TGAGCTGATC TCGCGGCATA GCCGGTTATT TCACCAGACA 780 

GACGGACGAT AACAGCCTGC TGTGCGATCA GCAGGGCCTT CAGTTGCTCG ATGTCGTCGG 84 0 

GGAGTGTGTT GTTCATTCCC CTGTTTTATC ACGGGTTATA TCCGGATGCC AGGCCGTTCT 900 

GTCCGTTTGG GATGTTGCCA CGCGATCCCC TCCAGTAGCA TGGATAACTG AGCTGGCGTC 960 

AGGTGCACTT TCCCTTCCCG GGTTACCGGC CAGACGAAGC GGCCCCGTTC CAGGCGTTTG 1020 

GCGAACAGGC ATAACCCGTC ACGATCGGCC CACAGTATTT TCACCATTTT GCCACTGCGG 1080 

CCCCGGAAGA CGAAGATATG CCCGGAGAAC GGGTCATCTT TCAGCGTGTT CTGCACCTTC 114 0 

GAAGCCAGGC CGTTGAAGCC ACAACGCATA TCTGTGATGC CAGCGATGAT CCAGATTCTG 12 00 

GTACCGGTTG GCAGCGTTAT CATCGGGTAC CTCCTTTTAT TTCGCGGATT AGCGCCCGTA 12 60 

ACATTTCCGG AGTGAGAGGG TCAAACAGTT TTACCACACC TGATTTAAGA TGCAGCTCGC 1320 

ACCGTGGGAC GTTTCCGGGA TCACACTCAG GGCACTCATC AGGCTTGTTA CGCCAGAAGG 1380 

GATTTGTAAC TGGTCTGGTC GGCTCTGGCG TATCAGTCAG AGCCACCGGG ACAGGCATGC 14 4 0 

ATTCCTGTAT GTCATCATGG CTCAGTAAGC CGTCCTCGTA CTGGCTTTTC CATTTAAACA 15 00 

GCAGGTTATC ATTGATACCG TGCTCTCTGG CGATCCGGGC AACAACAGCA CCGGGCTGTA 15 60 

ATGCCTGCTT AGCCAGACGG ACCTTAAATT CACGGCTGTA GCTGGCTCGC CGTTCTTTTC 1620 

GCCATGTGCC TTCGCTGATT TGAGGCTCTG TTAATTCCTT CTTTCTGTTG GCATAAAGGA 1680 

TGGCGTCAAG CTGAGCTAAT GAAACTGAAT CGGGCAATGG CCATGCGATA CCGGATGCAA 174 0 

TAAATCGCTG AAAAAGCGTA TGTATTGTGG AATGACTGAG ACCTAGACGC TGAGCGATGG 18 00 

CCCGGATGGT CAGTTTATCT TCAAATCTTA AACGCAGAGC AT C AGGC AAA TAAGAACGGA 18 60 

AGCAGGGAAT ATCTTTTTTT GTCTGGGAAT TCATGGTTCG TGTCCATCTA TATAGATGGG 1920 

CGCGATTGTT GCCAGACAGG ACAATTTTCA CAAGACGTGG CAGATGGGGC GCTTACCAGA 1980 

AATGCGCGGG TACGACAGTG ACTCGTCAAA TCTCAGTTGT AGCACACGCG GGATCAATTC 204 0 

CGGATTGTCT GCCAGTACCG CCTTTCGTGC ATTCATCTTA AATGTCCCTT TACTGCAAAA 2100 

ATGGACATTA GTATCGGAAA CAGGAAAGGG AGGCGAAAGA CGGTTTAAAT GAGACGGTTA 2160 

CCATTGTGTC GGGCTGTGTA CGTTCTCCCC GGACAGACAG CCTCAGTTCG TAGAATCTAT 2220 

AAATTACTGC TACTGATGCT GCCGGGGAAA GGCGTAACGA AAAAACAGCC TCCGTTACCG 2280 

GACAGCAAGG AGGCTGAATG GAGTTTACAG GATTTGCTTT TTTATAATGT CTGGCCATGC 234 0 

AGTAAAACCG GACAGGTTTT ATTATCATGT GAGGTATTCT GACATAAAAT GCTGGATTTT 2 4 00 
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-153- 

TATTTTGTGA CGAATGCTGC AAAATTGCAT CTGCACTCTG ATGTAGCTTT TATCTGTTTG 24 60 

AGTGAAGCAT GCCCACAAAC TGAGTTATTA AGTTGTGGAA GAACAGTTTT GTCCCGCCTG 2 52 0 

CATCTCTCCT TTCAAAAACC AGTATGTCGC CATGCC 2 55 6 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 790 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
CAGTTAGTGT TAAAAAATNT CCTCTGCTNC AGAAATTACA CCCACCAATA TACAATNATT 60 

AATAAATTTT CGGTTGGGTT AGGTAATGGC TGGGATTCGA TAATATCTCT TGATGGGGTT 120 

GAACAGAGTG AGGAAATATT ACGCTGGTAC ACAGCCGGCT CAAAAACAGT AAAGATTGAG 180 

AGCAGGTTGT ATGGTGAAGA GGGAAAGAGA AAACCCGGGG AGCTATCTGG TTCTATGACT 24 0 

ATGGTTCTGA GTTTCCCCTG AATAAGATGA TGGATTATCT GACTGGCTGT TCATCAGTCG 300 

GATAATGATG AAAACTGATG AGCAACAGGT TGTGCGGGCA ATGTGCAGGA TCCGTCACCA 360 

AAGGGTGGAA GTTGCGGGCG ACTCAGATAA ACGGGTTACA TGAGCTATTT CTGGAGTTTG 4 20 

ACGAAGCCGT CTGGAAGGGA GAAGAGGCGA TTCCATTGAT GTCTCTGGAA AACATCTGTC 460 

AGTCGTGCTG CTGGAAATAT TGATAGAGCA ATGGGAATGG TTATCCAACA TTGATGAACA 54 0 

TATTGTATAT TTACAGAAAT TTTTAAAAAC AGGACTCAGC AGGTTAAATC GTGTAAAAAT 600 

TACTCATGAA TACCATTATG GGCTTACAAA GCGATGTGGT TAAGCAGATC TTATTCAGGC 660 

CTGTGCAGCG TAGGATTACA ATAGGATCGA ATAACGCCAT ACAGGGGAAT GGGAGATAGG 720 

CTGATTCATC CTGTGGCTAT AACCAGGAGC ATATCGGGAA TCMANTATGT TACCCCAGAT 7 80 



GGAACACCAT 

(2) INFORMATION FOR SEQ I D NO : 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10906 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

GCGGCCGCAG TACTGGATCT CTTTGCGGCA TGACGATGAG GGGGAGAGAA ATAAACTTAA 60 

CCCAGTCATG GCAGATGAAG AACAGGCTTA CGTAAAAGGG TTATATGAAG GGATTATGCT 120 

GATTGGTAAT ATAATCAATA AGCCTGAAGA AGCTAAAGCG TTAATCAAGG CAACTGAAAA 180 



WO 98/22575 PCT/US97/21347 

-154- 

TGGCTGCAGA ATGGTGAGTA ACCGGCTGCA ACT7CTACCC GAAGAGCAGC GTGTTGGTGC 24 0 

CTATATGGCG AATCCTGAAT TGA"ACTTA TGGTTCCGGA AAATATACAG GATTAATGAT 300 

GAAACATGcT GGCGCAGTAA ACGT "GCCGC TTCCACCATT AAAGGTTTCA AACAGGTCTC 360 

GATAGAGCAA GTCATTGAAT GGAATCCTCA GGTAATTTTT GTGCAGAATC GTTATCCTGC 4 20 

TGTAGTGAAT GAAATACAGT CAA3GCCACA GTGGCAGGTA ATAGATGCTG TCAAAAATCA 4 80 

TCGTGTTTAT TTGATGCCAG AGTATGGCAA AGCATGGGGC TATCCGATGC CCGAGGCTAT 54 0 

GGGGaTTGGG GAATTGTGGA TGGCGAAAAA GGTGTATCCA GAAAAATTCA ATGATGTTGA 600 

TATGCATAAA ATAGTCAATG ACTGGTATAG AACGTTTTAC CGTACTGATT ATCAGGGTGA 660 

AGACTAATGC GAGTGCTTGC TGCGGGCAGT TTACGCCGGG TATGGAAATC ACTTGTGTCA 720 

GAGTATCAGG CCGATAATAT ACAGTGTGAT TTTGGACCAG CGGGTATATT AAGGGAGCGT 78 0 

ATTGAGGTGG GTGAGGCATG CGATTTTTTT GCATCAGCCA ATATGACTCA CCCACAGATA 84 0 

TTAATGtCCG CAGGanGAGC ATTGTGTATT AAACCTTTTG CCAGAAATCG TTTGTGTTTG 900 

TATGTTCGGG CGAATAAATT CAATGAGAAT GACGACTGGT ATTCTTTATT AAATCGGGAA 960 

ACATTGCGAA TCGGAACATC AACGGCGGGA TGTGATCCAT CTGGTGATTA CACTCAGGAA 102 0 

CTGTTTGAAA ATATGGGGAG TGTCGGTGAA AAAATAAGGC AACGGGCTGT AGCATTAGTT 108 0 

GGGcgGGAGG CATTCGTTTC CTCTTCCAGG AAATGCGATA gcAGCGCAGT GGTTAATTGA 114 0 

AAATGATTAT ACTGATCTGT TCATCGGTTA TGCCAATTAC GCTCCTGGCT TGCAATCAAT 1200 

TGATTCAGTA AAAGTTATAG AAATACCGGA ACCTTATAAT CCGATTGCTA TCTATGGATT 1260 

TGCCTGTCTG ACCGATAATG CCCTGCCACT TGCCGACTTT TTAGTTTCAC CTGTTGCCAG 1320 

AGGTATACTT GAACAGCATG GGTTTATGCC TCCAGGTACG TTATAGCCCC CTGTCTTACA 138 0 

GCTGt CTCTT gATCAGATCT CCTGATCAAG AGACTTCATC ACCAGGTAAC CCTCAACCAT 14 4 0 

ATCCTGCATA TCCTGAAGTC TGAACCAGCC ATCCCACATA ACTACCCAAC CGGGGCGGCC 1500 

TGTGCGTTTG CTGTCATGCC ATCGCCCCAG TTTCGCCAGT TTCAGACAGG CCCATTTCAG 15 60 

TGTCGGCGTC TGTGACGGAA GCGGTTTTCC TTCCAGCTTA ACCCACAGCA GTTTCCACTC 1620 

TGTCGGCGTC AGTATTTTCT TACAGCTGTC ATTTTGTGTT TCTTCACTGA TACCTCCCTG 168 0 

CCGCAGGCCa GCACCCGTAC CGCGATAAAC GCCTTGATAA CCACCATGCG CTCAAGGTTA 17 4 0 

TCCCGGGTCT GCATTCGCAG CGATTCCACA CATGTACCAC CACTTTTCCA CGCCTTGTGG 18 00 

TATTCCTCTA TCAGCCaGCG TCGCTCGTAA TGGCTGACGA TACGTCGCGC ATCGGCGGCA 18 60 

CTCGCCACTT TTTCTGACGT CAGCAGATGC CAGCAGGCAC CGTCCTCTGC CTGCTCCCGG 192 0 

CAACAGACAT ACGTGAGCGG GAGCGCCTGG CCGCTGTTGT CGGGATTTTT TATGCTGAcT 198 0 

TCGTTGTAAC TGATGAACAT CCGGGCCtgg CGGGCTGCCC GCCCGCCTTT TTGCATCACA 204 0 

TTCAGCGTGT GGCTTCCCGC GGTTGCCAGG ACTTCCGGCA GTTCGAAGAG CTTGCCGGGT 2100 
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-155- 

GCTTCTTCCA GCCGGCGATT CTGTGCAGCA CGCACCACGA AGCGCTGTCC GTGGCTGACT 2160 

TTATAATGCA GGTAATGCTA GATATCCGCT TCCCGGTCAC AGACAGTGAT TACCCGTTTC 2220 

TGTATCTCCC CCAGCCGTTC GGCCATACGC TCCGAAGCCT GCTGCCAGCG GTAACTTTCT 22 8 0 

TTTTCTTCAT AGGGACGTTC TTTTCGCTGG TGCTTAACAC CATAGGTgtC CGTGACCCGA 2310 

CTCCAGCGCT GCTGTTCGAT AAGACCGACT GGCAGGGCGC TGTCGGGGGC GTACATCAGG 2 4 00 

ACAGAGTGAG CCAGCAGCCC GCGCGTCTTC GGGTTAGTGG TGGTATTCCC CAGGTCATCA 2 4 60 

GATGCCGTAC TGTGGCTGAA GTTAATGGTG GTGGTGTCTT CCAGTGCGAG GAGCAGCGGA 252 0 

TGAGCCTCAC ATGCCCTTAC AGTGGCGGTA AATCCGGCTT CGGCAATGGC TTGCGGGGAC 2580 

ACAGACGGGT TACGTATCAG GCGGTACGCA CCTTCAACCT GAGGAGTGGA CTGGGATGAT 2 64 0 

TTCACAATAG AAAGACCTGC ATGCTGAGCG AGAGAAGAGG TCAGTGACAC AAGGCGTCGT 2700 

GTACGACGCG GATCACCGAG ACGGGCATGT CCAAACTGCT CGTTAGCCCA TGAATAACAA 27 60 

TCAGAAAGTA CCATAACAGA GTCGAATAAA ATGAAATATA AGAGAAGATG AACGGGTGAA 28 2 0 

GAAAAAGTTC AAAAAATGGC TACCGGGGAG GAAGGAAAGT ACCGGATGGA AAGAGCCCCC 2 880 

CTAAAGCAGA CTGACAGACA TCACAAATCC CCGGGGGGGA CTTGTGTATA AGAGACAGGT 2 94 0 

CTTACAGGGG GAGCGTCCGT CTTTTTATCA ACATCAGGCA ATGACATAAC ATTATGAACA 3000 

AGCTCACAAG TCTGATGGTT AAATTTTATA ATGCTCCTTA CTAAGACCGT ATTTTTTCAT 3 060 

TCTGAGATAG AGTTTTTTCC GCGGGATTTG TAAATATTCA GCAACCTCAT TGATACGCCC 3120 

CTGATGGATA TTAAGTGCCT CTGTGATTAT CTGTCGCTCA GCGTCCTCCA CTCGTCTGTC 318 0 

AAGCGGTGTC GGGGTTCCGA CGTGCATCAA CGGAT7TGCT GTTTCTGCCA GCGGTAATAC 32 4 0 

TCCTACAGTA AATAGTTCTG CTGCATTGGC CAGCTCTCGC ACATTATTTG GCCACATGCG 3300 

GCGCATCATC TCTTTGAGCA TCTCTTTTCC CACTTCCGGA ACAGGATGGT TAAGCCGTTG 3360 

ACATGCTTTA CAAAGGTAAT GGCGAAACAG TGGTTCAATA TCATCGGGGC GTTGAGTTAA 34 20 

TGGCAGGCAA GCGATTTGTG TCATTGCAAA GCAGTAATAG AGCTCCGCGA TGATATGGTT 34 80 

GCTGGCGGCC AGCTCGACCA GCGAAGTGTC TCCAATACCA ATCAGGCGAA AAGGTCGGTG 354 0 

TTCCTGGCTT TGTAACTGAA CCAGATGGTA CTGCTGTTCA CGCGTCAGGT GTTCAGGATG 3600 

GCTGAGCACT AATGTTCCCC CCTGAGCCAG CGC AATGAAA TCATTAAGCT GTGGTGCATT 3660 

GTCTGGTGTC AGCTCGCGGT AGATAAATTC GCCTTGTGGA TTACGTCCAA ATTGGTGCAG 3720 

ATAACGTGCA CCGGTCATCC GTCCTGTGCC TGGGGCACcG TAGAGCCAGA CGGCAATATC 37 8 0 

TGTTTCAGAC AACTGCTGTA AACGTCGCCG ATACTGATTT ATCCATTGAC TTCTCCCTAT 38 4 0 

CAACTCCACC TGCAACGTCT GTTGGCAATA CTGACGACGC GCAATGATTG ATTGACGCTG 3 900 

GCGTAgcGCC TCTTCAACCA GAGAAAGCAA TTTGCCGGGA TCAACCGGTT TTTGCAAAAA 3960 

ATCCCACGCG CCTTTTTTTA CCGCATCAAC TGCCATTGGC ACGTCGCCGT GCCCGGTaAT 4 020 
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AAGCAGAATG GGGATCTGTT 
GCAGCCAGGC ATACACACAT 
CTGCGCCTCA AAAGGATTGT 
TGTGTAGGCG TCCAGCACGT 
TAGCATCTTC CACATCCGTT 
TTGATGCCAG CCGTAATTCA 
CAATACCCAG TCCTACTTCT 
CCTCAGGCCA GCCCGGGCCA 
GCCAGTTAAC GGTAATGACA 
TAACCAGTAC CTGCTGGGTT 
GAAGAAGCGT AGCTTGCAAA 
ACATCTGTGC TAAATCAACG 
GCCGTAGTGA ACGGATAATG 
TCATGCTGGC CTGTTCTGTC 
GACAGCGCAT TTAGCGGCTG 
AnCCGCaGct TCGCTGTCTG 
ACCTTTCGAT CGGTAATTTC 
GTATGGCGTA ATAATCCTGC 
GTTTCCGTGC CTCCCAGACC 
ACCAACCGTG AGCGGATAAA 
GTCGCCACGA GGATCAGCCC 
ATCTGCATTT GTTGATGAGT 
GTGTCGCTTc CCTGGTGCGA 
GCATCGTAGC CGGCATTTTG 
GCGTCAGGGT AATGGTGCCA 
TTTTCAGATT TTCAAAATAA 
ATTTGAGTTC ATTGAGTCTG 
GGGTATAAAC CTGCTGCTGT 
TATCGCCyTG TCGGGAGGCG 
GCACTAAAGA ATTAAGCTCG 
GCTCACTCAC CTTTTCCCGT 
ACAGTGATCG ACTGTCCTGC 



-156- 

GATCATCCTG GTGAAATAAC AT CATC AAA T CGATACCAGA 4 080 

CA2TTAGCAC AATACCTGGC CAGTCTGGTT GTATCCACGT 4 140 

TACAGGCAAA AACCCGATAG CCTGACTGTT CAAGTAACTG 4 200 

CAGCATCATC ATCAATCAGC AGAATCGAAT ATTCACTACT 4 2 60 

AGTCTGAATT GCAGTACCAC ACAGGCATTC CTGGTCATCG 4 320 

CCTTTCATTT GCTCCATCAA CGACACACAA ATTGAAAGAC 4 38 0 

TTACTGGTGG TAAACGGCTT CAATAACGAA GGCAACAATG 4 44 0 

TTATCGCCAA TGAATACGTT CAGCGTTTTA CCCTGCATTT 4 500 

GCGCCTTGCC CACAAACATC AAGCGCATTC GCCAGTACGT 4 5 60 

CTGACCTCAT CGCCTGAAAC TGTGGCTGTA CCTTGCGGCA 4 62 0 

GGGCGATGAC GCATGGCCAG AAGTTCCCAG GCCGCACTGA 4 68 0 

GAATGGAGTG ATATTTCCAG TTCGGCGCGC CGGGTAAACT 47 4 0 

GCGTCAATGC GACCAATCAC CCcTTCGGCT TTACCAAGCA 4800 

TGGGTCTGTT cAaTGcCTGC GGGCTGTAAA CAGATACATC 4860 

ATTGATCTCG TGGGCCAGCG TGGTCATCGT TTGCCCGACT 4 92 0 

AATCAGTTCG TCCTGGGTGG CTCGCAGATC GGCTTCTATC 4 98 0 

TTGTTCAAGT TGCTGTTTTT GCACATTGAG CTGCCCGAGA 504 0 

AATTCTCCCC AGTTCATCAT TCCCATAAAC AGGAATAGCC 5100 

AATTTGCACA ACGGCCTGAT TCAGTAGGGT AAAGCGTTTC 5160 

ATAATGGTTG AATACCCATG CCAGCAGTAA CGCCAGTGcT 5220 

ACCgct AACG CGAACAATTT GTTCCATTCG TTGATTAAAC 52 80 

ACTGCcAAGT GCGCTTCCAG TAACGTTCTG AAGCGACCCA 53 4 0 

CTGGCATCCT CTAAGGCTTT TTGGGCGGTG ACATATTCAC 54 00 

TTTTTTACGA TTCCCATATC CAGCAATTCA TCGATAGTCT 54 60 

GGCCAGTCAT CCAGCATACG TATATTTTCA TCTGCCGTTT 5520 

CGGAGATGAG TTTCCACCTG TGTGTCGTCA TCACGTCCTG 5580 

TCACGCAGAT CGTCAACAAT CTGATTTTCA ATGCGTGCCA 5 64 0 

TCATTTTGCA CTTCACGAGA TCGCTTCAGG TATTGCGCCG 5700 

ATTTGATCCA GCAGCGTTCC CTGCTGCCAG GTGAAATCCT 57 60 

GTAGTAAAAT CATCGTGTAA CCAGTCAATC CTCGCTGATA 5820 

AGTAAAAACA TGTTGTAAAG CGCACGATCC AACTCGGATA 58 8 0 

AAAAt GACCG TCAGTTGTTG GCGTTCCCGG GATGACAGCC 5940 
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-157- 

CCCGACTAAG CCGTTCTATG GTGTCGAGAT GCTGAATAAT CTGGGTACGA AGTTGCAATC 6000 

GCACCGTGGT GTTGGGAGCC TGCAAAAATT CATTTAGCTG GTCTACCACC AGATTCAGGT 6060 

TCCCTTCAAT AAGGAAAGCA GAGTGAATAC GGGGAAAATA CTCATCCAGC GAGTAACGAA 612 0 

TTTGTGAGCT TTGTTCATGC CATGAATACA GACTGACACT ACTGACAATC AGGGTCAGAA 618 0 

GTGCCCCCAT CAGAAATGCG CAACGTAAGC TGGTACTGAT ACTGACCTGT CTTAAACGCT 62 4 0 

GCCACAGCGT TATGTTTTTC ATTTCAGCTC TTCCAGTTTT TTTATCGCCA GGCGCTGGTT 6300 

ATTCAGAAAC CAGAGTTGCC ATTCCATCAT TTGCTGCTCG GCAAAGCTTT TGTTATCGAA 6360 

CTGTGCCAGC CAGACGGGAT CTTCACTGCT GGCCGCTGCA ACGGGCACTT GTGTTAACAG 64 20 

TGCACGTATT TCTGGTAATG GTTTCTTCAG ACGTGCCTCG GTACTGTGCA GCGCTCGCCA 64 80 

GGCATCTTTT AGCTGTGCTA ACCGAAAGCT AATTGCCGTA TCAAACAAGC GCTGCACCAG 654 0 

ACGCTGACGT TTCAGGATAA GGTGATAATT CAGCGGGGGT TGATTCATCA GGAGCTGTTG 6600 

TTGCGTTGCC CGCGGATTGT CTGCGGCAAG TGGTGTCACC GGATATTTTC CTGTATTGGC 6660 

ATCGGCCAGA ATACGCTGTC CTTTCGGACT TAACAGGTAG TGAATAAAGC GACGGGCTGC 6720 

ATCGACGTGT GGGCTTTTCC TGAGAATTGC AACGTAGGTG GGGGATACCG CAGACCGGGG 67 8 0 

GAAATAGGTA AAAGAGAGAT GGGGGTCATT TAACAGTAAA TTAGCATAGT TATCGATAAC 684 0 

GGGGCCGGCA ACGCCGAGTC CGCTTTTTAT TTTAnTCGcT ACGCCAAAAC TGCGGGAGGA 6900 

GATTGTCACC AGGTTTCCTG CACTTGTCAG CAACGTTTCC CATCCTTTCA CCCAGCCTTT 6 960 

TTGCTGTAGT AATGACTCAA CCATTAAATG GTTAGTATCT GAACGCGACG GACTACTCAT 7020 

CAATAAAGCG TO CTGATAGA TCGGCAAAGC AAGATCGTCG CAGTCAGCAG GGGCAGGAAG 7 080 

GTGTTTTACA GAAAGCGCCG GACGATTAAT GAGCAGACCA AAACCTGATA TTGCTACTGC 714 0 

AACGGAGGTT GCACGGATCG ACTCCGGCAC CAGGTTTTGG CTTTCTGCGG GTGGATCATC 7 2 00 

AAACGGGGCC AGTTTCTGGT GCTCCTGAAG GTGCTGGAGC AGCATTGGTG ATGAAGTCAG 7 2 60 

GATAAGATCG ACGTTTTCTA CGTTGGCCGT ATCAAGCAAo TGTTCCAGTG AGGCACTGGT 7 320 

GCGGTTAAGC GTACGGATCA TTACCGACTC AGGCTCTGTT TGCCAGCGCT GTATTATCCA 7 38 0 

CGCGGTAGCT CCGGGTGAGA ATGTGGTGGC CATCACCAGT TCATTTCGTT GAGCCcTGAC 74 4 0 

GGCCCCGGCG TCCATCAGCA ACAGTAAAAG AATCATGGTT TTGATGCCGA TTTCGCACCA 7 500 

GCTAAAAAAT CGGTTTGTGA TCCAGGTCAT AAATATTAAT ACACCGCAAA AATCGCATTG 7 560 

AGACAAAAAT TACCCGTTTC AGACATT ZGT CTGATAACAC GTCTGCTCAA AGAGACCGTT 7 62 0 

AATATATTAA TGAGAGATTA CCCGATAATC AGCATGAGAT TTGTTAATAT CCGCACATGC 7 68 0 

T AA C AAC AAA CCAGATAAAG CATAAATCTA CCTTGTCTAT GCATCAATAA AATGGGTCAA 77 4 0 

AAACAGGCTT TGATTTTATT ATTTTGTGTC AATTGTGACA CATTTTTTCA GTTTGATGTT 7800 

TCAT YTCAAT TATATGACTC TCATTGTCAG AATACTCCT G ATGTTCATAT GAATATAAAA 7 8 60 
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T AC AG G TG AA GACATGTTAT CAATATTTAA AACGGGGCAA TCGGCGGATA GTGTT CCGGT 7 92 0 

GGAGAAAATT CAGGTGACAT ATCGTCGCTA TCGTATGCAG GCGTTACTTA GCGTATTTCT 7 98 0 

GGGGTATCTT GCATACTATA TCGTGCGTAA TAATTTCACT TTATCGACGC CTTATCTTAA 80*50 

AGAGCAATTA GATCTCAGCG CCACACAAAT TGGCGTACTG AGTAGCTGTA TGCNTATCGC 8100 

CTATGGTATC AGCAAAGGAG TGATGAGTAG CCTTGCCGAT AAAGCCAGTC CGAAAGTCTT 81 GO 

TATGGCGTGT GGGCTGGTGT TATGTGCCAT CGTTAACGTT GGCCTGGGAT TCAGCACTGC 8220 

ATTCTGGATT TTTGCGGCAT TGGTTGTTCT GAATGGTCTT TTCCAGGGAA TGGGCGTTGG 8280 

TCCTTCTTTC ATCACTATTG CTAACTGGTT CCCTCGCCGG GAGCGTGGTC GGGTTGGTGC 8340 

TTTCTGGAAT ATCTCTCATA ACGTCGGTGG TGGTATTGTT GCCCCTATTG TTGGTGCCGC 8 4 00 

TTTTGCCCTA CTCGGCAGCG AGCACTGGCA AGGTGCGAGC TATATCGTTC CGGCCTGCGT 8 4 60 

GGCTATCGTT TTTGCGGTAA TTGTGCTGAT TCTCGGTAAA GGTTCCCCAC GTCAGGAAGG 8520 

TCTACCCTCT CTGGAAGAGA TGATGCCGGA AGAAAAAGTC GTCCTGAATA CCCGACAGAC 8 580 

GGTAAAAGCA CCAGAAAACA TGAGCGCCTT TCAGATTTTC TGCACTTATG TATTACGCAA 8 64 0 

CAAAAATGCC TGGTATGTCT CACTGGTTGA CGTATTTGTA TACATGGTGC GCTTCGGGAT 8700 

GATTAGCTGG TTGCCTATTT ACCTGCTGAC GGTGAAACAT TTTTCTAAAG AACAAATGAG 8 7 60 

CGTCGCGTTT TTATTTTTTG AATGGGCCGC AATCCCTTCC ACGCTACTTG CCGGTTGGTT 8 820 

GTCAGACAAA CTGTTTAAAG GGCGTCGTAT GCCATTGGCG ATGATTTGTA TGGCGCTGAT 8880 

TTTCATTTGC CTGATTGGCT ACTGGAAAAG TGAATCGCTG TTTATGGTGA CAATTTTTGC 8 94 0 

TGCCATTGTT GGTTGCCTGA TTTACGTTCC ACAATTTCTG GCTTCCGTTC AGACTATGGA 9000 

GATCGTTCCC AGCTTTGCTG TTGGTTCTGC AGTAGGCTTA CGCGGTTTTA T GAG C TAT AT 9060 

CTTCGGTGCG TCTCTGGGCA CCAGCCTGTT TGGTATTATG GTCGATCATA TTGGCTGGCA 9120 

TGGCGGATTT TATCTTCTTG GCTGCGGTAT TATTTGTTGC ATCATTTTCT GCTGGTTATC 9180 

ACATCGTGGT GCAATTGAAC TTGAACGTCA CAGAGCCGCA TATATAAAAG AACACTGATT 924 0 

ACCTTCCCCA GGGCCGTCTC CCTGGGGAGT GGAGTATATT ATGATTTATA AGATATCTGG 9300 

AAATCAGAGA TTAATATGGA AATTTTATAA GACTGATTAC AATAAATGGA GATGGTATTG 93 60 

TCATGAGAAA AATGGATATC TTTTGTCTCA ATCAGATAAC GCATATAATT CGCAATTGTT 94 2 0 

ATGCATTGAA AATGCTAAAA AACAGGGATA CTCAGACGAA TCGGTCTTGC CACTTTTTCT 94 8 0 

ACATATTTCC TATATTCAGG AAAAAGGCTG GAAATGGTAT CAATGTTATG ATTGTGGATA 95 4 0 

TATTGTAAAA GAAACCTCTG TTTTTTTTTC GACATACCAG GAATGTGTCA ATGATGTTAA 9600 

AAGGAATATA CTAGCATCTA TGTGTAGTGG TTGTAGTGGC ACAGTAAATT TGGCCACCTG 9660 

ATTAAAGGTG ATATTCTCAC CACAACATAA AACAACAAGA AAACAAAGCG TACCTTCTCT 97 2 0 

CCTGAGTTTA AACTGGAATG CGCCCAACTT ATCGTTGATA ACGGTTACTC ATACCGGGAA 97 8 0 
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GCTACTGAAG CTATGAATGT TGGTTTCTCT ACTCTGGAGG CATGGGTACG TCAGCTCAGA 98 4 0 

CGGGAACGTC AGGAGATCAC GCCTTCTGCT GCAGCACCAC TCACATCAGA GCAGCAACGT 9900 

ATTCGTGAGC TGGAAAAGC A GGTGCGTCGT CTGGAGGAAC AAAATACGAT ATTAAAAAAG 9960 

GCTACCGCGC TCTTGATATC AGACTTCCTG AATAGTTACC GATAATCGGG AAACTCAGAG 10020 

CGCATTATCC GGTGGTCACA CTCTGCCATG TGTTCAGGGT TCATCGCAGT AGCTACAGAT 10080 

ACTGGAAAAA CCGTCCTGAA AAAGCAGATG GGGTGTATTA CACAGTCAGG TACTTGAGCT 1014 0 

ACATGGCATC AGCCACGGTT CGGCCGGAGC AAGAAGCATC GGCACAATGG CAACCCGGAG 10200 

AGGCTACCAG ATGGGACGCT GGGTTGCTGG CAGGCTCATG AAAGAGCTGG GGTTGGTCAG 102 60 

CTGTCAGCAG CCGACTCACC GGTATAAACG TGGTGGTCAT GAACATGTTG CTATCCCTAA 10320 

AAGCAACAGC AAACAGCGAC CACTGGGGAG CCCTGCATTG CGGGATTGTA TTGTTCAGCG 10380 

GGGCATGCTG ATGGCGATGG GGCCGAGGAG AGTGATTTTC ATACGCTCTC ATATGGTTTT 10440 

CGACTTGTGC GAAATGTCCA CTACGCGATC CGCACGGTGA AACTGCAACT CACCGACTTC 10500 

AGGGGAAAGT CGGGGCCGCT GGGTAATCTC ACATAAAAGT TCTTCGGTGT CATAAACAAC 105 60 

GAGAGTATTT GATTCCTTTA TGGTGGCCTG GTGCAGAGCT GCCCTTTCCC AGGACCTCCA 10620 

TATAATTTTT GTAGCGGCAG TCAGTGGCAC ACTCAGTTAA CTACTTTCAC TTCAGTGACT 10680 

TTGAATGAGT CAGGGCTGGC GTTAAAGGTG TTAATGAAGG CTTGTATTTT CCACTTCTGG 10740 

CCTGGTTCAA GATTGGATGC TGTGTCGATT GTTTGACCGA TAAGGACTCC ATCTTTTAAN 10800 

AGATTAAATT TTACATAAGC ATTTTTGACA ACAGAGTTTG ATTTATTTNC AGCATAACCC 108 60 

ACAATTGCCT TGGTCCCACT TGGGGTGTTT TCCACATGAA GGTTAG 10 906 
(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7430 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

ATGGTTATTT TTATTTCCTG CACCTTGCTT CATTTGAAAT AAAAACATAT GCATACGACG 60 

CTGCCATTGA GCAGAAAAAT ACAGGAATTA ATGTTATGAG TTAACCATAA TACCTGTGTT 120 

ATGAATATCT GACATAAACA AGAACAATTC ATATCTTCTG TATTCAGCAG AATAATAAAA 180 

GTTCGTCTGC CATTCTCAAA CTTATTCTTC GGAATACGTT GTTTCATGAA AGAAGGGGCC 24 0 

GGAATAAAAG CTGGTCACCG TAATGCTAAT ATTAATGCAG ACTACCGCCT TCTGGAATTA 300 

ACAGTCATCA ACCAGCACAA ACCATTAGCA AT CAAAC AAA TTTTAATTAA CAAAATTTTA 3 60 

GCTAATACAA TTACTGCATT AACCACTCTG CAGTTTGCCT TCTCAATAAG TTACAGATGC 4 20 
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CAAACAATAC TCTTTTATAT GTTATAACAT AA Z A C AAA C A A. T AAA T AAA G AACAGACGGC 4 80 

ACTCCATTTC TCCACGTAAG TGAGCCATCA GAATCGCTTA TGAATGTGTA CGGCAGACGT 54 0 

ATACTCGTGT TTTACTGCAG CAACCGGAGC AAAAGTTGCA CTTCCACAGC CTGGGTTAAG 600 

TTTTTCATGC TTGTGGGCTC GTCCTCCCTC CATTTCCACC GCGGGCAAAC AAGGCCATCT 660 

TTTGTCTGGC CACACAGCAG ATGGAGAGTC GAATTATGCT GTCTGACGAC ACCGGGAACA 720 

AATATGCCAT GCCTTCGCAC AATGAACCCG GGCATCATCG TTTTATCTTT ATAATCGAGA 7 80 

CAGGTATGAG GGAAAGTCGG ATGATAAGCA GATAGTGAGT GAGGCGCTGG AACATGGCGC 84 0 

TCTGGCAAGA GAAGTGTCAC AGGTTACCTG ATGATATGGG GCAACCTGAT ATCTACTTAC 900 

TTTTTTGCCT ACTCTCTTAC TTCATGCCAG CAGCGAGGGT ATCGACATTG TGTTTGAACG 960 

CTGCCGTGTA GGTAGCAGCG AGGCCGCTAC TGTCGGTAAG TGCTTCCGGA TAAAGCTCTC 1020 

CTCCCGCTTG TGCACCACTG GCATTGGCGA TTTGTTTCAC CAAACGGGGA TCTGTCTGGT 1080 

TTTCGATAAA GTACAATTTT ACGTGCTCTG TCTTAATTTG ATTAATCAGT TTCGCCACAT 1140 

TTTTACTGCT AGCTTCCGAC TCAGTGGAGT ACCCCACTGG CGACAGAAAG CGAACCCCGT 1200 

AGGCGGCAGC GAAATACCCA AACGCATCAT GACTGGTCAG TACTTTACGT TTTTCTCTTG 12 60 

GAATAGCAGC AAACGTCTGC GTGGCGTAAT TATCCAGTTG CTTCAACTGC TGGATATAGC 1320 

TGTCACCCTG TTTTCGATAA TCGCTGGCGT GCTCCGGGTC TGCTTTGCTC AGGCCATTGA 1380 

CAATGTTGTG AGCATAGACA ATACCGTTTT TCATGCTGTT CCAGGCGTGC GGATCAGTGA 14 4 0 

TGGTGATCCC ATCCTCTTTC ATTTTCAGTG TATCTATTCC GTTAGACGCG GTAATTACCT 1500 

CACCTCTGTA GCCAGAGGCT TTCACCAGAC GGTCCAGCCA TCCCTCCAGT CCCAATCCAT 1560 

TGACAAAGAC AACATCCGCC TGTGCCAGCG TTTTGCTGTC TTTCGKCGAC GGTTCAAATT 162 0 

CATGTGGATC ACCATCCGGT TGCACCAGAT CAGTGACATG AACGTATGGG CCGCCAATCT 1680 

GGCTGACCAT ATCGCCCAGT ACCGAGAAAC TTGCGACCAC ATTCAACTCT TTTGCAATCA 174 0 

CCAGTGGGCT CACTAGTAGG CTGGAGAGTG CCACAACCAA AATGGACCGT TTCATCTTTC 1800 

CTCCTTCATC TCGTTGCTAT GTGTAAAAAC ACTTCTTGTC AGCGACATCT GCATAACATG 18 60 

CCGCCATTAG AGCCAAACAG AACTGAAAAG CAGAAAAACA GAGTGCTCGT GAGGATGACT 192 0 

GCAGGACCTG CAGGCAAATC AGCGTAATAA GACCAGATCA GTCCAACCAG ACTGGCGCAG 198 0 

GTACCAATAC CCACTGCAGC TAACAACATG ATGGACAGAC GTTGACTCCA GAAACGCGCG 2 04 0 

CTGGCAGCCG GTAACATCAT AATACCGACT GTCATCAGGG TGCCAAGTAG CTGGAAACCT 2100 

GCCACCAGAT TGAGTACCAC CAT TG AC AAA AACAGGCAGT GGATCAGCGC CCGCGACCGA 2160 

CGTGACAGAA CTTTCAGGAA AGTGACATCA AACGACTCAA TCACCAGCAC CCGGTAGATC 2 22 0 

AACGCCAGTA CCAGAACCGA ACCGGAACTA ATTATGCCGA TAGTGATCAG AGCATTGGCG 2280 

TCAATAGCCA GAATGGAACC GAACAGCACA TGCAGCAGGT CGACACTGGA GCCACGCAAA 2 34 0 
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GAGACCAGGG 


TGACGCCAAG 


TGCCAGCGAG 


CCGAGGTAAA 


ACCCGGCGAA 


ACTGGCGTCT 


2400 


TCTCTCAATC 


CAGTGCGGCG 


GCTGACCACA 


CCAGACAACA 


TCGCCACAGA 


CAGCCCGGCA 


2460 


ATGAAGCCAC 


CGACTCCCAT 


CGCAACCAGC 


GACATGCCCG 


ATACCAGGTA 


GCCAATTGCT 


2520 


ACTCCCGGCA 


ACACCGCATG 


GGACAGTGCA 


TCACCGATCA 


GGCTCATACG 


GCGCAGTAGC 


2580 


AAAAAACAGC 


CAAGTGGCGC 


GGCGCTCAGG 


GTCAACGCCA 


GACATCCGAC 


CAGCGCCCGA 


2 64 0 


CGC AT AAAAC 


CGAAATCGCC 


AAATGGCTCG 


CACAACAGGT 


GCAGTAACAT 


CATGGCAGCA 


2700 


GCCCCTGCTG 


CGGTGGCGTG 


GCTGCAGCCG 


TGAGGGAATG 


GAGTATATCG 


GCACTTCTCC 


2760 


CCCATCGGTG 


GCCTTCCGCA 


CTGAGCATCA 


GTACATGAGG 


AAAGTATTTT 


TCTACCTGTT 


2820 


CCATGTCATG 


CAACACCGCA 


AGAATTGTAC 


GTCCTTCCAG 


ATGTAGCTGC 


CGAATAACAA 


2880 


CCAGCAGAGT 


ACGGATAGTC 


TGAATATCAA 


TGCCAGTAAA 


TGGTTCATCC 


AGCAGAATAA 


2940 


CCGACGGCTG 


CATCACCAGC 


AGTCGTGCGA 


ACAGTACGCG 


CTGTAACTGA 


CCACCGGAAA 


3000 


GTGTGCCGAT 


GTGCATCGGC 


GAAAATTCTG 


TCATACCGAC 


GGTATCCAGC 


GCTTCGATAG 


3060 


CTTTTTTTCG 


CCATAGACCG 


GAAATACGAC 


CGAACATCCC 


GCTGTGTGGA 


ATACATCCCA 


3120 


TCAGCACCAG 


ATCGTTAACA 


CTCAGTGGAA 


ACTGGCGATC 


AAATTCAGTC 


AATTGGGGCA 


3180 


AATAACCTAA 


CTGGCGTTGC 


CCCTGCGGTG 


CCATGCAGAA 


GCAACCACCC 


AGAGGTGGCA 


3240 


GCAGACCGGC 


CAACGTTTTA 


AGCAAGGTGG 


ATTTACCTGT 


GCCATTCGCT 


CCGATAATGG 


3300 


CAGTCAGTGA 


ACCGGTGTCA 


AAACATCCAT 


TCAGCGTACC 


CAGCGGGTGC 


TGTCCCGAAT 


3360 


AGCCAAATGC 


CAGTGAATGT 


AATGCGATCA 


TGTCAGTACC 


ACCGCCCAGG 


AAATAAGAGT 


3420 


CCATAACACT 


ACCAGCAGCA 


CACCGACGAT 


ACCCAGTCGG 


GCTATTGCGG 


AAAAAGCATA 


3480 


AAGACTGACC 


ACAGTATCCC 


CCATCAAAAT 


TGTTATAGTA 


TAACATTATT 


GCTTTATGGG 


3540 


TGCCGATGAT 


AGGTAAGAAA 


ATGTGTCATG 


GCTTCTGCAG 


CGTAAGCATA 


CAGCGAGAGC 


3600 


AGTATTGACA 


GGGATGCGTT 


AGTCATTTAG 


CAGTGTAATG 


CGCTAAATAG 


NTGCGCGGAA 


3660 


TAGTAGATCA 


CTTTGAGGGT 


ACTCAGCCCG 


GATTGTGCGC 


TCTGATCAAT 


CGCCAAATCA 


3720 


AAACAAATCA 


CCAACCGAAC 


TGAGCAATGC 


CGATCATAGC 


ACCAATTTCC 


CGTGACGAAC 


3780 


GACACCGGAT 


GCAGAAAGCC 


ATCCATAAAA 


CACACGATAA 


AAATTATGCC 


CGCAGACTGA 


3840 


CTGCCATGCT 


GATGCTGCAC 


CGGGGCAACC 


GTATCAACGA 


CGTTGCCAGA 


ACGCTCTGCT 


3900 


GCACCCGTTC 


ATCTGTTGGA 


TGCTGGATTA 


ACTGGTTACT 


AAAATCATT2 


CCTGCCGGGC 


3960 


GTGCCCATCG 


CTGGCCATTT 


GAGCATATCT 


GCACACTGTT 


ACGTGAGCTG 


GTAAAACATT 


4020 


CTCCCGACGA 


CTTTGGCTAC 


AAGCGTTCAC 


GCTGGAATAC 


AGAACTGCTG 


GCAATAAAAA 


4080 


ATCAATGAGA 


TAACCGGTTG 


CCTGTTAAAT 


GCCGGAACCG 


TTCGCCGTTG 


GTTGCCGTCT 


4 14 0 


GCGGGGATAG 


TGTGGCTAAG 


GGTTGTGCCA 


GCTCTGCGTA 


TCCGTGACCC 


GC AT AAAGAT 


4200 


GAAAAGATGG 


CAGCAATCCA 


TAAGGCACTG 


GACGAATGCA 


GCACAGAGCA 


TCCGGTCTTT 


4260 
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TATGAAGATG AAGTGGATAT CCATGTTAAT CCCAAAATCG GCGCTGACTG GCAGTTACGC 4 32 0 

GGACAGCAAA ACGGGTGATC ACGCCGGGAC AGAATGAAAA ATATTATGTG GCCGGAGCGC 4 38 0 

TGCAGTGCAG GACAGGTTAA AGTCAGCCAT GTGGGCGGCA ACCGCAAAAA TTCGGTGCTG 44 4 0 

TTCATCAGTC TGCTGAAGCG GCTTAAAGCG ACATACTGTC GAGCGAAAAC CAGCACGCTG 4 500 

ATCGTGGGCA ACAACATTAT CCACAAAAGC CGGGAAACAC AGGGGTGGCT GAAGGAGAAC 4 560 

CCGAAGTTCA GGGGCATTTA TCAGCCGGTT TACTCGCGAT GSGTGAACCA TGTTGAACGG 4 62 0 

CTATGGCAGA CACTTCTCGA CACAATAATG TGTAATCATC AGTACCGCTC AATGTGGCAA 4 68 0 

CTGGTGAAAA AAGTTCGCCA TTTTATGGAA ACCGTCAGCC CATTCCCGTA GGGGAACATG 4 74 0 

GGCTGGCAAA AGTGTAGCGG TATTAGGAGC AGCTATTTAG GAGAACAGCT CGCTGACCCG 4 8 00 

GTTGACTATG AGTCAAGCCC ATGAGGAAGA TAGCTTTGTG GATCAACATC GTTCAGTCTG 4 8 60 

GACGTCCCAA TCCAGCCACC AGCCACCAGC CACCAGCCAC CAGCCACCAG CCACCAGCCA 4 920 

CGAGCCAGGG TACAGTGCCA TCCCGACCTC CCCACGTAAA CCCAGGGACA GGCTAAAGGC 4 980 

AGAAAATGGG GAAGGCAGTA TGAGTCTCCG TGACAGAGAT GCGGGTACCT GATGGGAGTG 504 0 

AGATCATCTT CCCCTCCCGG TCAGTTCGCG GATCAACAGG GTGAGCAGCT CTGGCGAAGG 5100 

TTTTTCCAGC GTCATTTTAC CGTAAGGAAA TTCAACCTTA CAGGAACTGG CACAGACTGT 5160 

GCACTAAGTG GCAGTGGATA AAAGCGGAGT AAGAGCCGCC ACAGGCTCTT TCTGCTCATC 5220 

AGGCATTATC TCAACAGGTA ATAATTCAAC GCCAGCGCCA GAAGAGGTTG TTACCGGAAG 52 80 

ACGCCGCGCC CGCCTTCGTT CAGCCAGAGC CTGAGCCATT TGAGCAGGAG GTTATCATTG 534 0 

ATATCGTGTT CCTGGTCAAT ACGGGCAACA GAGGTGCCTA CGACGTTTTT TGAGTTCGGT 54 00 

TATCTATTGA CTTAACTCTT TGGCCAGTAA TGCTGCAGCC CCCGTGCCAT GAATAAACGA 54 60 

GTGGTCGCAG ACCACGCAAC ATGCAACATC ATTCAGATGG CCCGCTAATA TTACAGGTAA 5520 

TTCAGAATCA GGAATACTTT TCCGGACCAT TAAAAGTTGT GAGTGACGAT CAGTTGACTC 5 580 

ATCACTTTCA GTCGGGCTCG GTGGAACAGG ATGAAGACAA TGTAATCTTA TTCTCAAACC 5 64 0 

TTCTGGCATA TGAACTATCA TATTGATGGA GGGAATTTGG TTGTGCACTA AATACTGTAT 5700 

TTCTGCATCA CTTAAAATCA TCCAGGAATA TAGATGCATG CGATATAAAT TTTCTTTCGG 57 60 

GCATTTCAGG GAGTATGGAA ACACTTCATC CAGAGGTGAT AGTTTCTGTT CCCACCATAA 5 8 20 

GTTTGTTTCA AGAAGAACAA GTATATCAGG TTTTTCTTTA TTTATAAGTT CAAGAATGGG 5880 

TATATATTTT TTATTGGTCA TAAGAACATT GAATACCAGT ATACTTAAAC CCAGAAATCC 5 94 0 

ATCAGAGTCC TTTATTTCCT TTACCTGCTT GTTGCCAATT AGTGTATAAG GAATTATCCA 6000 

TACGAACTGG TAAGCGACAC AAATTAAAGT TATTATCCCA AGAAACAACT GTGTAAATAA 60 60 

GTCAAGAAAA ACAACAGACA GAAAAACATT CAAAGTAGAG AGCAAAAGTA TCTGTAGTCG 612 0 

GGGAAAATGC CATCCCCCGA CAACCCATGA TGTATTAGGG GAAACAGGGA TAAAAGTTAT 6180 
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GACTGCCAGA 


AGGATAGCAG 


TAAAAATAAA 


AACACAAGTT 


ATCACAAATC 


GCTCCTTGTT 


6240 


CTGAACCGGA 


ACACAAAACT 


GTCATATACG 


TTTCAAAAGT 


AAAAATACAC 


TGCTGCCACA 


6300 


AGATTTACAG 


CGTAACCGGA 


CAGCATATCC 


TGATTACGGA 


CAATCCATGA 


AACCGCCTCA 


6360 


CCAGAAGCGT 


CCATCACATC 


CGTTTTTTCC 


CTGTTTTATA 


TTCCCCGAAA 


CATTTTATTT 


6420 


TCAGGAATCT 


CCGGGCCTTT 


ATCCCGCATC 


ATTGCAAAAT 


GGCATCTGAA 


TCGATCATGA 


6480 


TTTGGCATCC 


ATCTCCGATC 


ACAGTTTGGC 


ATCACAATCG 


ATCACGATTT 


GGCATGCTTC 


6540 


CGATCATTGA 


TTAGCATCCT 


GCCAGTCACT 


CCGGGAATTA 


ACTCTTTTCG 


CCACAGTCTT 


6600 


CATTGCCGTG 


TTTAAACCAA 


TGGAGACGGC 


AATGTCCAAA 


AAGAGAATAT 


CCAGGAGCAC 


6660 


TATCGATACC 


TGTTTTAAGA 


TCCTTCAGCT 


CAAGTTCGAC 


CAGAAGCTGG 


CTAACCGTTG 


6720 


TATCGGACTT 


GCAAAACACC 


AATGGGGATT 


GATCTCTATT 


TTGCGACACA 


GACGCATTAT 


6780 


CAATACATCG 


ATGGTGCGAT 


CAAATACCTC 


AGTGGTCTCA 


CCGTGGATCA 


AATCCAGCAA 


6840 


TTGCTCACAG 


ATTAAGACTC 


GTCGGGAGTT 


TTGAGCCAAC 


ACCAGCAGTA 


ACCCATATTC 


6900 


ACCTTGAGTG 


AAATCTACAG 


GCTGTTGATG 


AGCATCAACC 


AGCACGTAAC 


GGTCCGGGAT 


6960 


CAAGTGTCCA 


GCCGTTAAAA 


AAACCACTCT 


ACTACCCTGC 


TCGACCTAAG 


CCTCGGCGTT 


7020 


CAGCCGCCTG 


AACGGGTATG 


GCAAGGGTGA 


AAAGAAACAG 


CATCCCCACA 


GTACCGACCA 


7080 




TGATGCTGGA 


ACAGAAAGCA 


TTCGCACCTC 


TCTTAGAATT 


AGACAGTGCG 


7140 


TACAGGATAC 


GTAAGACAGG 


GTGACGGGGC 


GGCGATAAAC 


TCTATTTACA 


AAGCTGAAAA 


7200 


TTTTCTGACG 


ATGAAAAACT 


ATTCAuACAAG 


GTTATCTGAG 


GCGTTAAAAT 


AACCAGCTCG 


7260 


ATTAACGACT 


AACTTGAGGT 


GAATATGAAT 


TTAAAAAATA 


TAATTTTAAG 


TACTGTTTTA 


7320 


TCAATCGCTA 


GTTGTCATGC 


CCTGGCTGTA 


GGTAATTCTC 


CAAATAGCGC 


TATCTAACCT 


7380 


TCATGTGGGR 


AAACACCCCC 


AGTGGGGAC3 


AAGGSCAATT 


GGTGGGGTTA 




74 30 



(2) INFORMATION FOR SEQ ID NO: 65: 

(l) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 6681 base pairs 
(D) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

AGATTATTCT GGCTCAGATT CATTTTTCAT CAGTCGCTTT CCCCTATAAA CCGTAAGGTT 60 

CCATAGTGTC GACGCTCTCG CTTAATTCCC ATATCGTCGA TAGTCTTATT AGCCGCTTCT 120 

GTCAGGTCAG AAAAAGTATC ACGCTTCTTT GGGAGTTCAA GTCAGATTTC TCGCCGTCGG 180 

GCGATGCGCT CAAAATGTTT GTCTGTATGG GGTCGCTTCA TCACGTCAAG CCATCGCGCT 24 0 

GCCGCTCTCC GCCAGAGTAC AAGCTCTTCC AGTTGTTCTG CTTTTTATCT TATCTGTGGC 300 
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GATGCAGTAT CCTCCTCCGT TTGTGTAAAT CGTTGAGTGG TGAATCACGC AAAGGGGCTT 3 60 

CTTTTTTCTG ATCTATCCCC ATATTCTTTA GCGTTCTGGT CGCAGCATCT CTGATGTCGC 4 20 

AGACACTGAA CCTTTGTATT TTCCATGATC TTGTGGAGTT TTCGATACAT CTGCTCCGAT 4 80 

GCTGGGTTAT AAAGATCCG C TCTTTATCAT CCTTGGCTTG TGTAAGCAAT TCTCCCCAAC 54 0 

GTTCTGCTGC ACGCCGCCAT AACTCTCTTC TTTCCAGTTC CTCAGCTTTT TCATCATGTA 600 

CCATTCGTGT ATCCCCGTTT ATCCAGTCTG AACCGCACCG GGTTTCCTGG AGAATGTTTT 6 60 

CTCTGTGAAC TCAGGCTGCC AGATCATCGT TTCCGATGGA AGCATAATAA GCTTTTTCTG 720 

CTTCTGCCGG ARGAATATGG CCCAGCTTTT CCAGCAATCG TCGATTGTCA TACCAGTCCA 780 

CCCACGTTAG TGTGGCCAGC TCCACTTCTG TCCGTTTTTT CCAGCTCTTA CGGTTATTAC 84 0 

CTCCGTTTTG TAAAGACCAT TGATGCTCTC CGCCATTGCG TCGTCATACG AGTCGCCTGT 900 

ACTCCCTGTT GATGCCAGTA ATCCGGCTTC CTTAAGCCGT TGCGGACACA TAATGAGAGC 960 

CTTTATCGCT GTAATTGTCA ACGACGGATG AAAAGTGATC CACTTATATC TCCACCAACG 1020 

GCCCAATATT GATCCACCGT TTTACTCAGG ATTAGCTTCT GCTATAACCC CGGCCTTTCG 1080 

TTTCTGTCTG AGTCGATAGC TTTCTCCTTT GATTTGAACG ACATGTGAGT GGTGTAAGAT 114 0 

ACGGTCCAGC ATCGCTGAGG TCAGTGCTGC ATCACCGGCG AACGTTTGAT CCCACTGCCC 1200 

GAACGGCAGA TTGGATGTCA GGATCATTGC GCTCTTTTCG TAACGTTTAG CGATGACCTG 1260 

GAAGAACAGC TTTGCTTCTT CCTGACTGAA CGGCAGATAG CCTATTTCAT CAATGATGAG 1320 

CAGGCGGGGG GCCATTACTC CACGCTGAAG CGTCGTTTTA TAACGGCCCT GACGTTGTGC 1380 

CGTAGATAAC TGAAGTAACA GATCTGCTGC TGTTGTGAAG CGAACTTTGA TACCTGCACG 1440 

GACTGCTTCA TAGCCCATCG CTATTGCCAG ATGGGTTTTC CCCACACCTG ATGGCCCCAG 1500 

TAATACGATA TTTTCATTAC GTTCTATGAA GCTGAGTGAG CGTAACGACT GGAGTTGCTT 15 60 

CTGCGGTGCT CCGGTGGCGA ATGTGAAGTC ATACTCTTCG AACGTTTTCA CCGCCGGGAA 162 0 

GGCTGCCATT CGGGTATACA TCGCCTGTTT ACGTTGATGA CGTGCCAGTT TTTCTTCATG 168 0 

AAGCAGATGC TCCAGGAAGT CCATATAACT CCATTCCTGG TGTACTGCGT GTTGTGACAG 174 0 

CGCAGGCGCT GCGCTTATAA GGCTTTCCAG TTGCAACTGC CCGGCGAGCG CCATCAGTCG 18 00 

TTGATGTTGC AGTTCCATCA TCACGCCACT CCTCTGCAGA ATGAGTCGTA GATGGAGAGT 18 60 

GGATGATGCA GGGGGTGTTT GTCGAAGTTC ACCAGATTTT CATCAAGATG CACGTCATAC 192 0 

TCTTTTTTCT CCGGAGCAGT GCCAGCATGG ACTGCTGTCT TCGAGCCAGC GATCGCAGGG 198 0 

ACGGGCCTGG ATTGTTTCAT GCTTTCGTTG GTTAGCGACA TCGTGCAGCC AGCGCAGACC 204 0 

GTGGCGGTTG GCTGTTTCAA CATCGACAGT GATCCCCATC GGGCGCAGGC GAGTGATTAG 2100 

TGGGATGTAA AAACTGTTAC GGGTGTACTG CACCATCCGT TCCACCTTAC CTTTAGTCTG 2160 

TGCCCTGAAG GGGCGACACA GTCGGGGAGA GAAGCCCATC TCCTTGCCGA ACTGGCACAG 222 0 
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CGAAGGATGG 


AACCGGTGCT 


GACCGGTCTG 


ATATGCGTCA 


CGTTGCAGAA 


CCAGAGTTTT 


2280 


CATATTGTCA 


TACAACACTT 


CGCGCGGCAC 


ACCACCAAAG 


AAGCGGAACG 


CATTACGATG 


234 0 


GCAGGTCTCC 


AGCGTGTCAT 


AACGCATATT 


GTCAGTGAAT 


TCGATGTACA 


GCATTCGGGT 


2400 


GTATCCGAGA 


ACAGCAACGA 


ACACGTGAAG 


CGGTGAGCGA 


CCATTACGCA 


TAGTGCCCCA 


24 60 


GTCAACCTGC 


ATCTGTCGTC 


CGGGTTCAGT 


TTCGAACCGA 


ACGGCAGGCT 


CCTGCTCCTG 


2520 


AGGAACCGAG 


AGAGAACGAA 


TGAATGCCCT 


GAGAATGGTC 


ATTCCGCCAC 


GATATCCCTG 


2580 


GTCTCTGATC 


TCGCGAGCGA 


TTACCGTTGC 


CGGGATTTTG 


TAAGGATGAG 


CATCGGCGAT 


2640 


GCGTTGACGA 


ATATAATCCC 


GGTATTCATC 


CAGGAGTGAA 


GCAACAGCAG 


GTCGCGGCGT 


2700 


ATATTTTGGC 


GGCTCAGATT 


TTGCCTGCAA 


ATAACGTTTA 


ACC GTATTGC 


GGGAGATCCC 


2760 


CAGTTCTCTG 


GCAATCGCCC 


GGCTACTCAT 


TCCCTGCTTG 


TGCAGGATTT 


TAATTTCCAT 


2820 


AACTGTCTCA 


AAAGTGACCA 


TAAACTCTCC 


TGAATCAGGA 


GAGCAGATTA 


CCCCGTGGAT 


2880 


CTGATTTCAG 


GCGTTGGGTG 


TGGATCACTA 


TTGCACCGTT 


CGTGACAGTA 


ATGGATTGTG 


2940 


TCAGACGGAC 


GACGGGCCCA 


TAAGGCCTGC 


TCCAGTGCAT 


CCAGCACGAA 


TGTTGTTTCC 


3000 


ATGGACGATG 


AGACTCGCCA 


TCCCACGATG 


TATCCGGCGA 


ACACATCAAT 


GATGAACGCC 


3060 


ACATAAACAA 


AGCCCCGCCA 


TGTGCTTATC 


CCGGTAAAAT 


CAGCTACCCA 


GAACTGGTCC 


3120 


GGGCGTTCTG 


CGATGAACTG 


ACGGTTTACA 


CCGTTGCATG 


CGGCAACAGC 


TTTCCGGCTG 


3180 


ATTGTCATGC 


GAACCTTTTG 


CAAACCCCAT 


ATATTTCAGA 


CGATACCGTT 


CAACGGTAGT 


3240 


GAACCCACCA 


TCACCGCTCC 


CGGTATCCGG 


CTCATGCTGG 


TATACCCAGA 


CATGCAGGGG 


3300 


TTCCAGCGTA 


CAGCCAATCT 


TTGGGGCAAT 


GGAACAAATT 


GACGCCCACT 


ACGAGTCATA 


3360 


CGACTTTCCA 


GAACAATACG 


GAGCGCCCGC 


TGACGGACCA 


GCAAAGAGGC 


GCCATTATTC 


3420 


TTATTACCTT 


TAACTAATAA 


TGCCAATTCA 


GACCCAAACA 


CGGCATCATT 


CGCTTCAGCC 


3480 


TCTGCGCCAT 


TAATTAATGC 


CAGGACTTGG 


TCAAGAAAGC 


GTTGCGGTTC 


GTTTACATCT 


3540 


GTTGCTTGTC 


GCAGGTAATA 


AGGTATTCGT 


TCAACAAACT 


GGGAACGTGA 


TAAAGGCTGA 


3600 


TGCTCCAGCA 


AAACCTCAAG 


CATTGCGGGC 


CGCAACAAAC 


GACGCTCAGC 


ATCAACATTG 


3660 


GGAAACTTAA 


CCTCAATGGC 


ATATGTGGCA 


AAATACTTAA 


GTTGCTCCTT 


AAGCCCCAAA 


3720 


TTAGGCATAA 


GAGAATCAAT 


TGAGCCAGAC 


GCCACTGCAG 


CGCTTGATTC 


AATTGTTTCT 


3780 


ACATACTCGT 


AGGAAGGTAC 


AACAACATCT 


GGAGCCAATG 


TTTTAAGCTC 


ATGGAGTTGA 


3840 


CGGATAATCG 


GGGATAGAAC 


CTCATCAGGA 


TTACTGAACC 


AATCAGTGGA 


CCAAATACGG 


3900 


CTAATTCTCC 


ACCCCAAACG 


CTCCAAAACC 


TCTTGACGCA 


AACGATCACG 


GGCAGATTTA 


3960 


GCTGAATGAT 


AAGCCGCACC 


ATCGCACTCT 


ATACCCATTA 


AGTAACAACC 


CGGATCTTCT 


4020 


ACCGACAGAT 


CAATAAAGAA 


TCCTGCAACC 


CCACCTGAGG 


TTCACACTCA 


AACCCAGCGT 


4080 


GATTGAGTGC 


TTCCATTATA 


GCAACCTCAA 


AGTCACTATC 


CGGAGCCCTG 


CCCGTATACG 


4 140 
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TCGTGAGGGA ATCTAATTTG CCACTTTCGG CAAACTGTAA AAAACCTTTC AACGAAATAA 4 20 0 

CACCAAATTT ACTGGTTTCA CTCGTCAATA CATCTTCAGA ACGCATTGAA CTAAACACAT 4260 

GCATCCGTTT CTTTGATCGA GTTAAAAGCA CATTCAAGCG GCGCCAGCMA ACATCGGAAT 4 320 

TGACAGGCCC AAAGCGTTAA TAAACCTTTC CAGCATGCTC AGAAGGTCCA CAGGTAAAGG 4 38 0 

AAATAAAGAT TACATCACGC TCATCACCTT GAACGTTCTC AAGTTTTTTC ACAAAAAGTG 44 4 0 

GCTCTTCCAT GGCATATAAG GGATGAATTG CATCGTTAAA TTCAGTGCGA TTTCGGCGCA 4 5 00 

ATTCATCAAT AGCGCGCTCA ATCTGATCGC GTTGCCTGGA ACTCATGGCC ACTACCCCAA 4 5 60 

GAGATTCATC CAGCCGGTGT TGCGCATGAT GAAGTACAGG CTCAGCAACT GCTTGGGCTT 4 62 0 

CTTCAATATT GTGTTGATTA GAGCAACGAC GTTTTGATAC ATAAGTAAAT TTGATTCCAT 4 680 

ACTCTGGAGA CTCAGCATTT GGAGAAGGGA ATATCACCAA ATCACTGTTA TAAAAATGGC 4 74 0 

GGTTAGAGTA TGCAATTAAC TTTTCGTGTC GTGAACGATA GTGCCAATGC AAACGTCTCA 4 8 00 

TAGGAAACAG TGGCAAAGGA GCATCCAAAA TGCCGTCAGT ATCACTTAAA GCCGCGACAT 4 8 60 

CATCGTCATC TTCTCCGGCG GAACTTCGAT CTGAAGTGGG ACACTGAATT TGGCCACCTG 4 920 

AACAGAGGTG ATATGCTCAC CTCAGAACAA CACAGGTGCT CCAATGAAAA AAAGGAATTT 4 980 

CAGCGCAGAG TTTAAACGCG AATCCGCTCA ACTGGTTGTT GACCAGAACT ACACGGTGGC 504 0 

AGATGCCGCC AAAGCTATGG ATATCGGCCT TTCCACAATG ACAAGATGGG TCAAACAACT 5100 

GCGTGATGAG CGTCAGGGCA AAACACCAAA AGCCTCTCCG ATAACACCAG AACAAATCGA 5160 

AATACGTGAG CTGAGGAAAA AGCTACAACG CATTGAAATG GAGAATGAAA TATTAAAAAA 5220 

GGCTACCGCG CTGTTGATGT CAGACTCCCT GAACAGTTCT CGATAATCGG GAAACTCAGA 5280 

GCGCATTATC CTGTGGTCAC ACTCTGCCAT GTGTTCGGGG TTCATCGCAG CAGCTACAGA 5 34 0 

TACTGGAAAA ACCGTCCTGA AAAACCAGAC GGCAGACGGG CTGTATTACG CAGTCAGGTA 5 4 00 

CTTGAGTTGC ATAACATCAG CCATGGTTCT GCCGGGGCAA GAAGCATCGC CACAATGGCA 54 60 

ACCCGGAGAG GCTACCAGAT GGGGCGCTGG CTTGCCGGCA GGCTCATGAA AGAACTGGGA 552 0 

CTGGTCAGTT GCCAGCAGCC TGCGCACGGT TATAAACGAG GTGGTCGTGA ACATGTCACT 5580 

ATCCCGAATC ACGTTGGGCG GCAGTTCGCA GTGACAGAGC CAAATCAGGT ATGGTGCGGC 564 0 

GACGTGACGT ACATCTGGAC GGGGAAACGT TGGGCATACC TTGCCGTTGT TCTCGACCTG 5700 

TTTGCAAGGA AACCGGTAGG TTGGGCAATG TCGTTCTCTC CGGACAGCAG ACTGACCATC 57 60 

AAAGCGCTGA AAATGGCCTA GGAAATCCGC AGTAAACCAG CCGGGGTAAT GTTCCACAGC 5820 

GATAGTAATA ATGCCGGTAT CAGTTTTTAT CATCACTCTG TTTGCTGTTT AACCAGACTG 5 8 80 

GTGTGATTAC TGATGCAGTG AAGACCTTCC CGCATCCTGA CTCACACAGC GATCGACCCT 5 94 0 

TTGTGTCCTG CCCTGGACCT GTCGGTTGCC GGAAGCGCCT TCATGCGAGG CGTCTCCTCA 6000 

CCGATGCGCG TGACTCAAGA AGGGCCTGAC GGTTTGTCTC GTTACTGTCC TGTCCGGGTT 6060 
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ATCTGTCTGG 


AGATTCAACT 


CTGTTTCCTC 


ACAGGAGCTC 


TGTTATGGCA 


GGTAAAGTTA 


6120 


CGGAAACCGC 


TGTTGTGGGT 


GGCGTGGATA 


CACATAAAGA 


TCTGCACGTT 


GCCGCTGTCG 


6180 


TAGATCAGAA 


CAATAAAGTT 


CTGGGGACCC 


AGTTTTTCTC 


CACAATACGG 


CAAGGTTACC 


6240 


GGCAGATGCT 


GGCATGGATG 


ACT TCGTTTG 


GGGCATTAAA 


GCGAATTGGT 


GTTGAGTGTA 


6300 


CTGGCACCTA 


TGGATCAGGT 


CTGCTTCGCT 


ATTTACAGAA 


TGCCGGGTTA 


GACGTTCTTG 


6360 


AGGTGACTGC 


GCCAGATCGG 


ATGGAGCGAC 


GCAAACGGGG 


TAAAAGTGAC 


ACGATTGATG 


6420 


CTGAATGTGC 


CGCTCACGCC 


GCATTCTCCC 


GAATAAGAAC 


CGTCACACCC 


AAAACGCGCA 


6480 


ATGGCATGAT 


TGAGTCTCTG 


CGGGTATTAA 


AAACTTGCCG 


AAAAACAGCA 


ATATCAGCCC 


6540 


GCAGAGTCGC 


TCTCCAGATT 


ATCCATTCCA 


ATATTATCTC 


TGCCCCGGAT 


GAATTACGTG 


6600 


AACAGCTCAG 


AAATATGACG 


CGCATGCAGC 


TCATCAGGAC 


TCTGGGATCC 


TGGCGGCCTG 


6660 


ATGCCAGTGA 


ATACCGCAAT 


G 








6681 



(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1342 base pairs 

( B ) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 



TATTCGCGCA 


TACGCGTTGC 


ACATGTTCTT 


TTGGCGAACG 


ATCATCGGCA 


ATACAGAGTT 


60 


CCCAATGGGG 


ATAGCTTTGA 


GCCAGGACAG 


AATCCAGACA 


GGCACGCAMG 


TAGATCTCCG 


120 


CTGGATTATA 


AACAGGAATC 


ACAATAGATA 


TAACTGGAGG 


GTGAGTCATA 


CTGGCAAGCA 


180 


TCAGACTCAC 


CWCTTCKTTG 


CCAGGCAACG 


AAGGTAATTC 


CACCGTTTCT 


ATCCATTCCT 


240 


CATAACCGAC 


AGAAGACGGG 


GTAACGCTGA 


ACGTYTCGTT 


ATAGAATGCT 


TGCAGGCGCT 


300 


CTATTGACAT 


ATCGCCATTG 


TSCATCAATA 


TGGATTTTWT 


GATTTTTTCT 


AGCGGCATGT 


360 


CACGATAGCT 


TTGGTGTTCT 


TTTTGAATGC 


GAGCCAATAG 


TGCAGACTCG 


ACTACTTTCA 


420 


CATCAACAGC 


CGCTATTTCA 


AACTGATTAA 


TTGCAAATTT 


TGCTGCCTGT 


TCTAATGGAT 


480 


CAAATCGTAA 


TGCACAAGAG 


GCGATTCCAG 


ATAGAACAAC 


GACTGACGCT 


GACCGCTCGT 


540 


TTATATGGCA 


ACGTTACTGT 


TTCAAACTCA 


TTGAACCCTT 


TACCTGTATC 


CAAATRTAAC 


600 


TTAGCTAATC 


CTTGCTTTGG 


TTGGGCAATT 


AATAGAGATA 


TTAAATTGAT 


ACCATCCCTT 


660 


GCTAATATTT 


GAGAGCTGCT 


CCAAATCAAT 


AATGAAAAAT 


GGATCATTTC 


CCTCTGCAAC 


720 


CCAACTTTGT 


GAATTATCTA 


TATCTATCGA 


GAGCTGATTT 


GTTGCCAGAT 


AGGGCAGCAC 


780 


AACTGTATTT 


TGCATTTTAC 


TCACTGCAGG 


AGAAACGTCC 


CATGCTTCGC 


ATGGTTTCCT 


840 


ACCAAGTAAC 


ATCCCATAAC 


GCTTAAAATG 


TTCTCTTGCT 


GACAACCCGG 


TCTGTTTCAC 


900 
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AT CC AAA TAG TTATGCAGAT ACCAATGTTC ATCAAAGTGA GCTAGGAACT CGTGTTGGTG 960 

ATTTTTAACC ATCAGTTTTA TTCTCCCTTA TTGACAGGGA GGCAACTGCG CTGCTCAAAC 1020 

TTCCCATACA TAATGTAATG AAGCAGCGGA TTAATGCCTC CTTGGGCCAC ATCCGGATAG 108 0 

GTTTGCAAAT ACCAGCGAGT ATCAAACTGC TCACTAGGGC TATAACCTTT ATCCGCCCCC 1140 

ACGCTAATAA AATGCTCAAG AGCTGAGAGC GGAGTGTCTG CAACCTCTGG GTAGCGATGT 12 00 

TGATACCAGA GTTCATCAAA CAATCCTGAA GCGGCAANTA CTCCGCGGCA CTCTCTGTAG 12 60 

CTGTTGTTCT GGATGGAGTG TCCTCCTTAA ATGTTCTGCC AAGAGCACGA ACTGGGGCTG 1320 

TAATCTTCCA AGAGACGGTT CT \342 



(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1580 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 



CGAAGGAAGC 


AGTNTGCNGC 


CTGCGCTGGC 


GGAGTTGCGC 


CTGTTCCCAC 


CGATGATGCT 


60 


GTACATGAAT 


CCTCCGGCGA 


ACAGAGCGGT 


GAACTGGAAA 


CCATGCTTGA 


ACAGGCCGCG 


120 


GTCAATCAGG 


AACGGGAATT 


TGATACCCAG 


GTGGGGCTGG 


CGTTAGGGCT 


GTTTGAGCCG 


180 


GCGCTGGTGG 


TGATGATGGC 


GGGCGTGGTG 


CTGTTTATCG 


TCATCGCCAT 


CCTCGAGCCG 


240 


ATGCTGCAAC 


TGAACAATAT 


GGTTGGAATG 


TAATTTACGG 


AGTTATCACA 


TGAATTCGTT 


300 


ATCCCGCACA 


CAAAAACCAC 


GGGCAGGTTT 


TACCCTGCTG 


GAAGTGATGG 


TGGTGATTGT 


360 


TATTCTTGGC 


GTCCTGGCAA 


GTCTGGTGGT 


GCCTAACCTG 


TTGGGCAACA 


AAGAGAAARC 


420 


CGATCGGCAA 


AAAGCCATCA 


GCGATATCGT 


GGCGCTGGAG 


AATGCGCTGG 


ATATGTACCG 


480 


ACTGGATAAC 


GGGCGTTATC 


CGACCACTGA 


GCAGGGGCTT 


GAGGCGCTGA 


TCCAGCAACC 


540 


GGCCAATATG 


GCGGATTCCC 


GTAACTACCG 


TACCGGTGGA 


TACATTAAAC 


GACTGCCAAA 


600 


GGATCCGTGG 


GGCAATGATT 


ATCAGTATCT 


CAGCCCGGGT 


GAAAAAGGGC 


TGTTTGATGT 


660 


TTATACCCTG 


GGGGCAGATG 


GTCAGGAAAA 


TGGGGAGGGC 


GCTGGCGCAG 


ATATCGGTAA 


720 


CTGGAATTTG 


CAGGAGTTTC 


AGTAATCAGT 


GCCTGAACGC 


GGATTCACAC 


TTCTGGAAAT 


780 


CATGCTGGTG 


ATTTTCCTTA 


TCGGCCTTGC 


CAGTGCGGGC 


GTGATACAGA 


CGTTTGCGAC 


840 


CGCTTCAGAG 


CCGCCTGCGA 


AAAAAGCGGC 


GCAGGATTTT 


CTGACTCGCT 


TTGCGCAGTT 


900 


TAAGGACAGG 


GCAGTGATCG 


AAGGGCAAAC 


ACTCGGTGTG 


CTAATCGACC 


CGCCTGGCTA 


960 


TCAGTTTATG 


CAGCGTCGTC 


ACGGACAGTG 


GCTACCCGTT 


TCTGCGACCC 


GCTTATCGAC 


1020 


ACAGGTTACG 


GTGCCAAAAC 


AGGTGCAGAT 


GCTGTTACAA 


CCCGGCAGTG 


ATATCTGGCA 


1080 
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GAAGGAGTAT GCGCTGGAGC TGCAACGTCG TCGCCTGACG GTGCACGATA TTGAACTGGA 1140 

GTTGCAAAAA GAGGCGAAAA AGAAGACGCC ACAGATCCGT TTTTCGCCTT TTGAACCCGC 1200 

CACGCCGTTT ACGCTGCGCT TCTACTCAGC GGCGCAAAAC GCATGTTGGG CGGTAAAACT 12 60 

GGCACACGAT GGCGGGTTAT CCCTCAGTCA ATGTGATGAG AGGATGCCAT GAAGCGTGGA 1320 

TTTACCTTGC TGGAAGTGAT GCTCGCGCTG GCGATTTTTG CGCTGGCTGC CACGGCGGTG 1380 

TTACAGATTG CCAGCGGCGC GCTGAGTAAT CAGCACGTTC TTGAGGAAAA AACGGTAGCG 14 40 

GGCTGGGTAG CTGAAAACCA GACCGGACTG CTCTACCTGA TGACCCGCGA ACAACGGGCG 1500 

GTCAGGCACC AGGGCGAGAG CGATATGGCA GGAAGCCGCT GGKTCTGGCG AACCACACCA 1560 

CTGAATACCG GTAATGCGCT 158 0 



(2) INFORMATION FOR SEQ ID NO: 68: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3241 base pairs 

(B) TYPE: nucleic acid 

(C) GTPANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 



CTTAACCATT 


ACCCAGCATT 


TGGTAGTTAA 


ATAGTCGTTA 


AAAGCATAAA 


ACATGGACAT 


60 


TGTGCCATCC 


CAGCTAAAGC 


ATCCATTACC 


GCCTGACAGG 


GATAAAAATA 


AAAAAGCAGG 


120 


GAACCATTTT 


TTCATCAGAA 


ATCACTTCCG 


TAATTACAGT 


TATTCATTTA 


GGTATGACTC 


180 


AGTTATAAAT 


CATGCTCATA 


CTGGCCGTGG 


TCTGGRAATC 


CCCGCCATTC 


AGTATCCCGC 


240 


TGCCATTACG 


AAAGGGCACT 


GAAGTAAAGG 


TGAACGTTGA 


ACGTGCTGTG 


TCCAGACCTG 


300 


CTGTCACTCC 


GTAACCATTT 


CCTGAACCAT 


TACCTAATAT 


AAGAGGTGTT 


GACATTCCTT 


360 


TTCCCTGATA 


CAGCGCTATA 


CCAAAATGAG 


TTATATTTGT 


TGCCAGTACA 


TTATTCTGAC 


420 


CTCCTCCCAT 


AGTATTTCCC 


GTAACTTTTA 


TCCAGAGAGA 


GCCACTCTTA 


TACGGACAGG 


480 


ATATGCTTAT 


GGTTTTTGTG 


ACTTCACCAC 


GTGAGTTGTC 


CACGTGCTCA 


GGATTAATAT 


540 


TCCCAAAATC 


AACAACAATA 


TTCTGCCCGT 


TATTAATGGT 


GCATGGGGGG 


ATATAAACAT 


600 


TCCCCCTGAT 


GTTAATCTGC 


ACATCAGCCA 


GTACAGCGAC 


CGATGTCAGA 


AGCAACGATA 


660 


TAAATAATGA 


TAAACGAATC 


ATTCCCCTCC 


GGAGAGCGGT 


ACAGAAAACA 


TTTTATTTTA 


720 


CGAGATATAA 


AATTAACGTA 


TTTTAGTTGA 


TACTATTACG 


AATATGATGC 


AACCAGCGTT 


780 


GCTGTTGCAG 


AGAAAGGACC 


GGCTATCAAA 


TTCTGCATAT 


TCCCTTTATA 


TCCAAGTTTG 


840 


GCATGAAGTG 


ATATAGTTTT 


ATCTGCATTA 


TTACCTGTGA 


TTTTTCCGGG 


CGTAAATGGA 


900 


GTCCCTAAAG 


TTATCGCAGT 


CCCAATATTT 


CCTGCATTAC 


TGTTATAAAG 


ATAAACGAGT 


960 


AACCCATCAG 


AAGATGTGTT 


TGATGTAT7C 


TGAACTAAAA 


TAGCATTGTT 


AT AAGTGTTT 


1020 
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GTTGCCGTTA TCGTAACCTT CATTGTTCCC AGATTATAGG GACACCGCAT ATTCACAGTA 1080 

AACTCTTTTT CGTGATTTCC ATTTTGACTO AGGGTCTGAA TCTCTACATC CTGCCAGTCA 114 0 

ACAGTTGTGT TGCTTACAGT ACAGGCAGGA ATAATCAGTT TTCCTCTGAA GGTCAGATTA 1200 

TCAACTGCAT GTACATGCTG AGACATTAAC ACTGCCCCCA GCATTACCGG AAGACACAAA 12 60 

CCTCTTATCT TTTTCATCTG AAATATCCTG TACAAAAATT TTGCTAACGA TATGTCAATT 1320 

CAAACGTGGC TGTTGCTTCA TAATCACCGG GTACCACAGT CTTCGTCCGC AGGCTTCCGG 138 0 

CGTTGCCACA ACATACGCGC CGAAAGGAAG CTCAAGACTG TTTCCGGTAA CCTTTTCCCC 14 4 0 

CTGGCCTTTG TTATGGGAGG TGCCGGGTTT CAGCAGACTG CTGCCATCGG TGTCCAGCAG 1500 

TGCAATGCCT AACCGGCCAG CATTCACTCC GGTTACCTTC AGATGGCCCG GGAGGGCGCC 15 60 

TCTTCCGTCC CCTTAAAGGT CAGGGTCACA ATTTTGCCAA CTGCTGTTGC ATGGCAGTTT 1620 

TCCAGCCTGA TGACAAACGA CTCTGTCGGC GAACGTCCGG GCGGATACCA GAAATCCCTG 168 0 

GACGCCCGGG TTTTGAAGAC GACATGTTTA TTCAGACTGT CACGGGACAC ATGGCAGGGT 174 0 

CTGTCAAGCA GATTAGCCCT GAATGCCACA TCTGAGGCTA TTGCGTGTCC GGCAGACAGT 1800 

GCGGCAAACA GTAAAAGAGC GCCTGTGCTT TTTATCATCA CATTCCCTTA CTCATATTTT 18 60 

ATGCTCAGAC GCAGCATGGC CGGATTGCTC CTGGCATCAG AATACTCACC CTCCTGTGTC 1920 

GCCCTTTTCC TCCAGGCGGC CAGCATCTCC TCCTGCCGCC GGTCAGGCCG GCACAGTAAA 1980 

AAGGTATCAC CATCGTGTAT AACAAGATGG TCACAGCCGG ATAGCTTACG GTCAGGAAGT 204 0 

AAAGCACTTC CGCTTCCGGG ACCGGTTACC AGTGAGCCGG AGACTGTCAT CGCAACGCCC 2100 

CGTTTTCCGG GCTGAAGTGC ACCACCGTCC CCACATCCTG CCAGCCTCAG CATCAGAGGT 2160 

GCTCCGGCTG CCGCAGAGTG ATTTTCCGGC CGGAGGYTTA ACGGCACCTC ATTACTCACC 2220 

AGCGTGCAGG GTGAGGACAG CAGTGCACCA CTGACGGTCA GGCTTCCGGT GCGTCCCCCC 2280 

CGTTCATTTA TCCGGTAATG ACGCAACTCA TCTGCAGTAA AGACGTCATC GTATATACCC 234 0 

CGCTCTTCAG CCCGCAGGAA AGTATGGATG AAACCACTCA GCGACAGTGC AATAAGATAC 24 00 

AGTACTGCTG TTGTTTTATT CACAACCATA ATATCCCACC CGCATTTAAC CGTTATTGCG 24 60 

GTACATTATT TCTCTTTTTT CACAGAGCAA CGGCTAGCAT TACAGATAAA CGACAGTACC 2520 

GGGCGACCAC CATAGTCATT AATATAAGAC AGATAAGGGG TATTATAATT TGCCGATTTT 258 0 

ACTGTCTGCT CTGAACGGGG AGACAGCATC ACGGTTTCAA ACTCACCTTC CTCTGCCTGC 264 0 

TTTTCACTTC CTCCCAGACC AATAACAGTG ACATAATAGG GCGTTGGGTT TTCAATACGA 2700 

TACCCACCGC TGACTTTGTT CAGAATTAAC TGGTCCTGCC ATACTTCATT TGGTCTGGTT 2760 

TTAATTGCTG CCGGGCGATA AAAAAGCTTT ATTTTGGTCT GTAAGGCTAT CTGCAGTACA 2820 

TTGGCCTTTT CACTCCTCGG CGGTATTTCC CTGAGATTAA AATAAAACAG TGATTCCCTG 2880 

TCCTGAGGAA GTTTACTGAT ATCCGGTGTG GTACTCAGCC TGACCATGCT TTTCGCACCC 2 94 0 
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GGCTCAAGGC GCTGAACCGG AGGGGTGGCA ATAACCGGCC CTGTAATAAT TTTTTCCTGA 



3000 



TTTTCATTTT CTATCCATGC CTGAGGAAGA TAGGGCAGTT GTTTGTTATC ATTGGAGATA 



3060 



tgaagcgt:a ttgacttctc actcccgtca aacaccgcgc gggttctgtc cagcgaaaca 



3120 



gcagcgtctg ccccggatat aacaaacagg gggatggcag gcatcagaat cttttttcga 



3180 



ATCATAGTTA ATTTCCACAT TCTGTAATTT CACCTGGTCC GGAAAATGGC ATAACCGCAT 



3240 



T 



324 1 



(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 398 base pairs 

(B) TYFE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 69: 

AACGTGGATC TCCAGCTGAT CGGTGCCGTA TTCCAGGTCG TAAGTTTCAC TGATGGTTTC 60 

ACGCGGCAGT TTGCCCGGTT TACGGACCGG TACAAAGCCA ACGCCCAGAC CCAGAGCTAC 120 

CGGAGCGCCA AACAAGAAGC CACGCGCTTC GGTGCCGACA ACTTTGGTAA TGCCCGCATT 180 

TTTGTAACGC TCAACCAGCA AGTCGATGCT GAGAGCGTAA TTTTCGGGTC TTCCAGTAAG 24 0 

CTGGTGACAT CGCGGAAAAG AATGCCGGGT TTTGGGTAGT CCTGAATGCT TTTGATGCTA 300 

TTTTTGAGAT ACTCAAGCTG CTGTGCATCG CGGGKCATAA GTGTATGCCT GCTTGTTACG 3 60 

GTGGTACTCA CGGCGCGTTT TTAAACGTAT CAAAAGTT 3 98 
(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17710 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
(L>) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

CAGTTNCNGT TCTCATAGAC AGATTGATAA AATCGTAAAC AGCCCCTAGC ATTCCCGTTT 60 

CCTTTGCACA CATATTCAGG CACGGGGATA AAGTATAAAG AATGTCGTAC TGCTGCTACC 12 0 

AGAGCAATAT TCCCCCCTGA TGGCCGTATC AGAGATAGTA TGCCGGTATT TTGCGGGTGG 18 0 

TTCCCGTCAG GTTATCGTGT ACCTCCACGG TCGTAGTCAC CACCGGCATT CCGGCYTTTC 24 0 

TCAGCCTCAA AACATCAGCT GCAATACGCT GACTGCCGAA CCAGAACAGG CCGTCCAGTG 30 0 

CAGTCACCAG CAACCCCGCC TCCAGCGCAT GCTTCAGCCG TTCACGGGGC GCTTTCACTT 360 

CCCGGGCAAT CTGCTGGTAT G jCGATGATG TGTTTTCATT CCCAATCACC CGGCGAATAC 420 
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GATGAGACAG ATGATACCGG TATGTATCCG GCACACCGGA AAGGCTGGCC TTCAGGCTGT 4 80 

ACACGCAGCC AAATCGTTTA TCATTGAACA CCACATTTTT CTGGCTGATG CCCCATTCTT 54 0 

CACGCAGCGC GGCAATCAGT TGTGGTGTAC GGGTAAGCAA CAAGCGAAAA GGCAGTTCAA 60 0 

AACTGGTGAC ATAAT TCACA TTCAACAGGG CAATGCGAAG TCGTTCTTCT GGTCCGGCTT 660 

CTGTCTGCCG GCACTCCTCC AGGACATCCT GCGACTGCAG GCGAAGACGG GAAGAGTCAT 720 

TCAGTTCTGT AAAGCAGTAT TTATCCGCCA GATAGTCAAT TCGTGTATGC ATACTGAAGA 78 0 

GTATTCCGTA TAAAGATTCA GCTGGCAAAA CTTTATCAGT CTGTAAAAAC TAACGGAAGA 84 0 

GTCGATATTT CTCCCGACAA TCACCGGATG ATTGTTGCAA TACCTCGTGG CAT C AG AG AC 90 0 

TGAACAGCAG TTTTTAACGC AACGTATTGC TCTGATGTAT CAGGCCGGAC AACCCGAAAA 9 60 

CAGCCTTCCA CCCGGCATTG TCCGCCAGCG CTTATCACCG GCCAGGTCTG TTGCAGTAAA 102 0 

TCCGCCACTT GCGAACATGC TTCATCAACT GTGACACTGG CCCGCGGATG GCAAATGCTC 108 0 

GTCTGGCTGA GCAGCAACAG GCATCGCATT GTTGCTCCTC TATGTTGTTC CCGCAACCAG 114 0 

CGTAATACCA CCGGCGAGGA TGGACAGGCA GTGTGATTAC GCTCCGTAAT ACGTTCGTGC 12 00 

ACCCGTCGGT GAAAGGAACT ACAGAATGTC TGAATCTGTT GCCCGTTGAT GTATCCTTCT 12 60 

GTCGAATGAA GTGTGAAGTG GATTGCCAGC AGATGCGGCC AGTGATCCAC CGCCTGCTGA 1320 

ACAAAACGCC GGATTTCCCC CGGCTCTGAA AGTAAGGCTT CGGTTATTTG CACTATTTTA 1380 

TCTCTGTTGA ATTTGGTTAA GTCGGTGCAG ACGCATCAAC ACAAGTACGG TTCGATGCAA 14 4 0 

ACAGCTGTGA CTGGCAATAT GAAAGGAATG ATGAATCAGT CAGGATGACA AAGTGCCGGC 1500 

TGACCGGAGG GGACGCAGGA AGATTCACGG GGGGACCAGC ACCAGGGAAC AGCGCCACAA 1560 

TACCAGCGCT GACACGTTGA ACATTGCCAG CGTACCGGTA TCACAACACG TTTCATACTT 1620 

CTGCCCCCGT GATTCTTCGA TTCGTTACTG TATCTACTGT GACACTTCGC TTTTATACCT 1680 

GCGGCTGGAT CGGCCCGGCT TGATGAATCT TCACTGATCA GCTTATAAAA CCCTCTGTCG 17 4 0 

GTCATACCGG TGAAACTGGT GATATAGTTC ATGTCAATCA GGGAATTATC GGCACGCAGA 1800 

AATACGCTGT CGTGGCTTGT TGTAGTCAAC ATGGTCAGAA TGTCCTCTGT GAGATTTATG 18 60 

AAGATTGTGC GAATGCGGGG AATCTACTGA GCTGTGCTTT CAGAACTGGC CTGTTACGGG 1920 

AKRSCAGGGA TTACCGGCGG GGTAACGGGC TTCCGGATCA TACACACCAC GATTATCGCG 1980 

GACAAAATCA CTGAACGCCC ATATCACCTC TTTAAGTATG TCTTCGCAGC CCGGTACATG 204 0 

ACGATCCAGC GCCACATCCC GAGTGGTACT ACTTTGATGC GCCCGGTGAC ACAAAGCCCG 2100 

GATTGTTCCA GACATCCTGA ATCAAACGCC CCAGATTAGG GGCGTCGAAA TATGCCTCTC 2160 

TGACCATTAT ATTCCGGTGT ACAGGTAGCA GGTCAGAAGT GACAATGCGT CACCTGACGT 2220 

TAAAAGTCAC TACACCCAAG ATGACGTTCA ACAGCACCAT GCGATTCAAT GTAAGCCCGG 2 28 0 

GCTGTCTGTT CCAGTACACC AGGCTCAGCG TTGTATGTGT TAGCTGCATC AAATACCAAC 2 34 0 
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GACAGCACTT CAGGATACAC AACCAGATGT 
CCCCACCCCT GCTCAATCAG ATTTCTGAGA 
T CGTTATTAA GTAGCAGCAC CATAAGATAA 
CTCCAGAGAT AATATAAAGG GGTGGGCTCA 
AAATATTCAG AAATGAGTCT ATGCAGTTCA 
GCCTTATTAA TACCAGGGCA AGGTATTAAT 
GTGGCTGCAG CCCGATACAG AGTTGCAAGG 
AGCTTAACGT TTGATTCTGT ATAGATAATA 
CCGCTGTAGC CAGAGTTACG CTGGCCTGAT 
AGACGCTCCA ATAAGCGCTG ATACTGCTCA 
TCACGTCCAT TAGCAGGGAA ATGAATAACG 
TCCTGAGGTA CTGATCAATA CGGAGAGGAC 
CAGATTCGGC GAATCCGCGA TCACGGTGCG 
CGGGTTTTAT TCAGGTAAGC AGGATTGCGG 
CGGGGTAGGT GCGAAACACC GGATAAAATG 
CAGAGCGGAT ATTTTGGATT AAGTACTCGC 
GTAGCTGTAA ACAGACTTCG TACATGTTGC 
GCAGAACTTT TTCCCGGGAA AATGCTGCCC 
CGGTAATGGC GATAGAAACA TCGCCATATC 
CAGCACCACG CAGGCCGCCT TCTGTTGCGC 
CGCCCCCTGA GCATCTGTCT GCAAAATCAA 
CCTGATATTT CTGCTAAGGC TGAGGCCGCA 
AGCCCCGTTT TTATACCGTT CATTCAGCCA 
AGTGAGTGCA GTACCGCTCC CCATAATAAT 
CTTTACGCAC CACGGGTAAG GCATCCGGTA 
TTTCTCTGCG CTCCGGTCTG ACATAAGGGC 
CAGTGTGTTC AAACCAGGGA AGTTCAGTGT 
CGCAAAGGTG GCAGGTGTAG CGGTCGTAAG 
GACGTCCGTT GCCATCAAAT GCGAGAAAAG 
GCCAGACGAC ACGCAGGCGT CAGCGTCCCT 
CGGTACAGTT AAGGGGGGGG TGGAAAATGG 
ACATCATGGC GTACCAACGT AAAAAAT AAT 
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GTAATGGAGT TATCTTCACC CAATACTTTT 2 4 00 

ACCACCACCT CACGACTCTT ACACCAGACA 2 4 60 

GGAGTGGTAT CGTTAGTCAC AGCCTCCCTA 2 520 

ACAGATTTAT CTTTACGTCG CTTACACTGC 2580 

CCAGTAAAAT CCGCC ATCAG AGAGGGAATG 2 64 0 

TTAAATTGTA ATAATTTAAT TTCAGGATGT 2 7 00 

ACACACTTTT GCCAGAGGGC GTTACTGGAA 27 60 

AATCACCTTA CAGTTACAAC AGGTCAAAAA 2 820 

GCTTTAGTAC CGGGCTTCGT CAGATAATCC 2 8 80 

GGGAAATCAG GATCATGAAT ATCCTGGATG 294 0 

CAGCCCCCTG GATTAACAAT GCAGAAATCG 3000 

TCTCGCGTGT GGTTTATTGA CACCACAGTG 3060 

ATTTCGTTCC ACAGCACACA ATCATGACCC 3120 

ATATCCGGTG TCGCGCCTTT CTGTCACGAA 3180 

CAGGCTGGCA ATACCTCTGA ACGCCCTGCG 32 4 0 

ACCTCCGCAG TCCTGAAACA AGTCTGGCTG 3300 

TCTGGAATAG ATCCCCGTGC CACAGGCTTC 3360 

GCACATCACA CAATGCCACT CCAGCACGAC 34 20 

CTCAATGTAA GGGTGGGACT TTTCCGGATT 3480 

GCTCAGGGCA TGTAAATCGT GGTCAAACCA 354 0 

CCGACGACGA CAGGAAAGGC AGAAACAATG 3600 

CTGATAATGT GTTCACCCGG CGTGATCCCC 3 6 60 

CTCCCTCCTC ACTGAAGTGC CCTGTATGGC 3720 

CGTGGTGAGA TTGTCTGCAG TGCCAGCTGG 3780 

CGAATTTCTG CAGACGCTTA ATCAGTTGTA 384 0 

ACTGTTGACC GTGCTCCGTC AGCCCGTCGT 3900 

CGTATTGCGG ATGGTATCTG AGCGCACTGC 3 960 

GTGCAGTCTG TGCGGTACGG GCAGCGGTCA 4 02 0 

ATTTTGCGTA CATAGTATAT GTTCCTTACG 4 08 0 

TTACGGGCAG CGTGGGCAGG GTGTGAATGG 4 140 

GCGGGCTGTT GTTACAGCAC TGTGGATGTC 4 200 

CAGCAGGCCC GGATACATCG TTGTCGCCGG 4 2 60 
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ACATCAGCCC GTCCTGCTGG TTTTCCCGGG CTCAGCCCCG ACTGCAGCCG AAATTACGCT 4 320 

CACCAGTGGC GTGAGCTTTG GTATGTTCCT TCGCCAGATA GTCAGCACGT TCCAGCACCT 4 360 

GCTGAAAGCC AGTGTCATCA CCGCGTTCCA GCCACACCGC CGGCGTGTCA GGAAAATGCG 4 44 0 

CCAACGTGGC ATAAGGCCCG GCATCCACCC CCAGGGCACT GCACCAGGCN TGWTTAATCA 4 500 

TCCCGGCCAG TGACCCCGGA TCGCGGTAAT CGCCGGCACG ACACCAGGTA TCCCGGTTGA 4 560 

CCAGCAGCAG GAGGTGATAG TGTTTTTTGC CCCTGAGTAC CCCGAACTCC CGGGCCCAGG 4 620 

CGTAATGCAG GGTGGTGGGA TGCACGCGTT TACCTTCACG NCGTTACGCT TCTGGTAAGC 4 680 

GTCGATTCGG GCTTTCAGGG CATTGATGAA GCGGGATATC ACAGCCGCGT CGGTAGCTGG 4 74 0 

CGGTACATCC GGGAGACGCA GATCAACCCG AAGTGCCGTC AGGCGGGGAT GAACATTCAG 4 800 

TGCGTGCCGC ACCGTCTCAC GAATACGTTG CTGCCAGAAG GGGTTGTATT TGTAGGTCAT 4 8 60 

GGTTAAATCT CCGTATGGTT CATACGGAAT AGCCACGTCG TAAAAAATGC GCAGAGCCCC 4 92 0 

TGACGTGGCC ACCGACAGAA CACGGCCTCA GGCGCGTTGT GATAACCCAG CTATCGTTTC 4 98 0 

CGGACTGACG GTTGAATTTC CTGCGTTGTT TTCTTAATGT AAAAAACCTG CTACGGGTAA 504 0 

GGCTGTGAGG AGGAAGTGAT GGTGATACGC AAAAAGAAGT GCAGGGACTG CGGAGAAGCG 5100 

ACAGAGCATA ACACGGTATG TTGCCCACAC TGCGGTTCTG TCGATCCCTT CGGCTATTAC 5160 

CGCAATACAG ACAGAATATT CACCCTCCTG ATGGTCCTGC TGGTTGTGGT TCTGCTGATG 5220 

ACGGCTGCGG TCAGCGTGTA TGTGCTGTGG TAGTCGGAGG GGCAGGGAGC AGACGATGAC 5280 

GTAAAATATC TCCGGTGCTC AGATATCACG GCCGGTCAGA CCGCAAACCA ACGGTTAATC 534 0 

GTAACCGGAT CAGGCAAATG TGTGATTAGC CCCCTGGCGC TCATACCCGC ACCGCAGACC 54 00 

ACCTTAAGTA CTTCCCGCCC GACACCATTC CCTGCTCCCG GATAATTTGT TGTCGCTATA 54 60 

CCGCTTAACA TCACCGATAC CACACCGGCG CAGATAGCAC CGGATTCATT GTAGAGATGA 5520 

CTTAAGGTTC AGGTAACATA TTTCCAGACA GAAGCGGGAA CACGATCGTA AAGTTTGTTC 5580 

ATGGTCAGTT CTGCCAGCCG GTGATCAACC GCAGAGTTGA AATTTTCCAG CTCCGCCGGG 564 0 

GTGAGTTTAT ACCGTGCGTG GGAAATCACT TTTTCCAGTG TCTCCCGGGA TGAACAACGA 5700 

CGGAACTGAT ACAGCCAGTC TTCTTTGGTT TTTACTTCCA TTCGTCTCTC GTTACTTTAT 57 60 

GCTGCGGTTA ACAGGATGCC GTCAGTATAC CGCATGCAGA CACTCTCCCG CTCCCCCGCT 5820 

TGCTGCGATA CAACTTAACG TTTCAGGAAT CCAGTCATCG CACCGGGAAA GGCTTTCTGG 5880 

TGACAGGAAA CGTCAGGAAC AGGAGTTTCT CAGACTCCCA CTCATCGGAT CAGGCTCAGA 5940 

CAGGATTATT AATACGCTCA GTTCATGTGT CATATACAGG GCATCGGGGA TGAATATATG 6000 

GGTATAACTC AGAGCCTGTA CTACAGCTTT CACTGCTGAC TGATTTTACG TATCAGCGTT 60 60 

CATGTATCTG CACTCTGATA TAGAATACTT CTACCGGAGC TACTCTTACG TTAGCTCACT 6120 

CTCACATCAG GCAACATCAC TTATTCAGCT CACTTACCTC TTACCACTCA CTACTTCTTT 6180 



WO 98/22575 PCT/US97/21347 

-175- 



ATATTTATAA 


TATCAATCAG 


ACAGCCTTAT 


CCCCCCGGTA 


ATATCTGTTG 


CCTTCCCGCC 


6240 


AGCCACAGGC 


TTATTCACCA 


CAACCACCTC 


CGATAACAAC 


TCTGCAATTA 


TCAGAACGCC 


6300 


TGCTTCTCTC 


CCTGTCCTCA 


CGAAAACTAT 


CCCCTCTTTA 


TCGCGCGTGC 


GTGCGGAAGC 


6360 


ATCTTTTCGC 


AACAACCACC 


CGGGATTCGG 


CTACGGCTCT 


GCCATCGCAA 


TCCCCCCGTT 


6420 


TATCTCCGGA 


CAGCCACATT 


CCCGATTATT 


TTTTACGTTT 


CTCCCCGGTT 


GTTATGCCGG 


6480 


TGAAGGTGGT 


GCGTCGTTTT 


CATCACCACA 


CCGGTTGCGA 


TTAACAACAT 


CCGGAGGAAC 


6540 


ATTCTCATGA 


CCACACCCTT 


TTCACTGATG 


GATGACCAGA 


TGGTCGACAT 


GGCGTTTATC 


6600 


ACTCAACTGA 


CCGGCCTGAG 


CGATAAGTGG 


TTTTACAAAC 


TCATCCAGGA 


CGGAGCCTTT 


6660 


CCGGCCCCCA 


TCAAACTGGG 


CCGCAGCTCC 


CGCTGGCTGA 


AAAGTGAAGT 


GGAAGCCTGG 


6720 


CTGCAGGCGC 


GTATTACACA 


GTCCCGTCCG 


TAATTTCTGC 


CCCTTATCCG 


TTCACCCGCA 


6780 


GCAGACGCCT 


CCCCGGCCTG 


CCGTTGACAT 


TCTGCTGCCT 


GTTTTATCCC 


CGTGAGGAAT 


6840 


ATGAAAATGA 


AACAACAGTA 


CCAGACCCGC 


TACGAATGGC 


TCCACGAAAG 


CTACCAGAAA 


6900 


TGGCTGACCG 


GCTTCAMCCG 


GCACGCGGTA 


TCCTGGGGCG 


TGTGTCATCC 


GAATATCTAC 


6960 


TATTTCCATA 


ATCTGACGCC 


CGGGTGGGTG 


TCATTCAACG 


GCGAACAGTC 


GGAGATTGCC 


7020 


ATTGTTCCCG 


GCAGTCTGCA 


CCGGCTGATT 


TATGGTCATG 


ACAAACGGGC 


CATGCCGCCC 


7080 


CTGGATGATG 


ATCTGGTGGT 


GAATTTATGC 


ACCAGTGAGA 


ATCTGCTGGT 


TCATCATCCG 


7140 


ATGCTGGAAG 


GCATTCTGCT 


GTCTGAGTGC 


ACGCGCCTGC 


ATAAAAAATC 


ACTGGCGAAC 


7200 


AAACTGATCA 


GTATATTCCG 


TCAGTTTGAC 


GGCACGGAGC 


TGCGTCTCAA 


ACTGGTCTGG 


7260 


CTTTGCTGGT 


TTGATTTAAT 


GACCGGAAAC 


TGCCTTGACG 


ACTGGACGGA 


GAACCTGNAA 


7320 


CGGAAATCAG 


AAAAAGAGCT 


GGAGAAATGG 


ATCATTGAGC 


GCCAGAACCG 


GAACGCACCG 


7380 


CTGACGAATC 


TGATGGATCA 


GTACGTGCTC 


CTGGCATTCC 


GCACAACGGT 


TGACGATAGC 


74 4 0 


CGCAACTGAT 


GTCTGCATGC 


TGCCSGCTGA 


AGCCATATTC 


ACGGGGCAGG 


GACGCCCCTG 


7500 


CTTCCGCAAC 


AATCCGGGGT 


AATGGCGACG 


TACGCCTGCA 


GAGTGTGTTC 


ATCGTTGTCA 


7560 


CAGCCGGACA 


AGGTGAATAC 


CGTTGATGAT 


GCGGGGATGA 


ACCTGCTGGT 


CCACCGCGCT 


7620 


GTCACTCAGA 


CGCGTCAGCG 


TGTATGGACG 


CCCCGATCGA 


ATGGTTCTTC 


CGCCAGAGTG 


7680 


CACAGAAATG 


AGGCACGGAA 


CGTTACCTGA 


AGGGTGACCG 


GCACGGACTG 


CAACTTGTTG 


7740 


CCATTGATGG 


CGCACAAGTC 


ACATACAGCA 


GAATGTCGTG 


ALLbLALL 1 I 




i ft n n 


CGAAACGGTG 


CTGCCCCACT 


CCACCACCAT 


CCCGGATAAC 


GCCATTACGC 


TGTCTGATAA 


7860 


GCGCTTTTAC 


AGCGCAAATC 


TGGTGCAGAA 


AAGCGTAAAG 


CTGACCTGCC 


GGAGCAGGAT 


7920 


GTGGGCATGT 


TGCGGGCTTA 


CAACCTGATA 


CGGCATGAGG 


CACTAAAAGC 


AGCATCAGAA 


7980 


ATCAGCCTGA 


GTTCGCGTTC 


CGGTTTATCC 


CGACAGAGAG 


GACAGTGCCG 


GGCAACACGG 


8040 


TGTCACCGGG 


GAGCATCCCG 


AAACGACCGG 


AGCATCTGCG 


GGATGCTCTG 


TAAGTGGTGT 


8100 
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TAAGGTGGGC GGTTAAGGTA TCAAAAAAAT CGTTATCCTG TGAAAGACAG TGCGCTCTGC 8160 

TGAAGTGAAC GTCACTGCCG GGAAGCATCG GGTTTCGCTA CCGGACAGTC GCGGTAACGC 8220 

GTTTACCGGC ATCTGTCTGT GTGGCAGGGA TGGCTGATAT TGTCGGTTAT ACCAGCGGCA 8280 

GGTGCGTCCT GTTATCTGTA AAATCAGGGC GTGCCGGTAC AGAACGCCTC GTTGATGCCG 834 0 

GTCACTGAAC GAATCATCCT CTGACGAAAA CAACCGTCGA TACAACGCCG GCGTAAAAAG 8 4 00 

AAAACCGGAA ACCATCTTGT GCACGACAGG TACTCAGGGG GGTATAACGC CTGCGCAGGA 84 60 

TCACATCCGG GAACAGGGCT GCTCCTCAGT GTCTTCGTGT GGCGAAGCAT CTGCAACCGG 8520 

ACGGTAGTGG CCTCAGAGCA ATCTCCCTGC TGCAGTGCAC AGAGTAAGCC GGAAAGCTGG 8580 

TGAATGCCGG GATGACACAC TGCGACGTGG AGAAACAAAC GACAGACTCC GTCCGCAGTA 8 64 0 

ACACTGAAGG TAGTCCCGCA AAGCTCAGAC TTCTTCCTGC ACGTTATCAG CGGACTGAAC 8700 

CCCGGTCAGC CACTTAAACC TGCTAATCGT GTTGCTGCAT ACCCGCCGGG CCGGAAGGTG 8 7 60 

TTATGAAGCC CGCCACCGGA GCGCTTCTGC AAATATCCGG GGAGATAAAA TTTTGGTGAC 8820 

AGGATGACGG TCGTGCTGCA GACGTAAAGC CGCAGGAGCG GACACGACAG ACAGTGTTCA 888 0 

CTGTGGCGTC CTTTGCCGTC GGTATCGTGC TCACGCTGAG GTCCCGGGGG TACACCTGAC 8 94 0 

GACAAATACC TGGGATTCCC GGGACGGTCT GTTCTCCGTA AAATAAAGAA AATGCGGGAT 9000 

GCCTCCCGGA CTGCAGAGAA GAGGGATTGA CAGACAGTGT ATATTGCGTA CGATTACAGG 90 60 

GGAAAAACAC AGTAAATATG GAGGTCAGGT CCGAAAACAA CCTACGAAAT TTCTATGAAA 9120 

AACGATTGAA AAAATCATCA AATTCAGTTC GTTTTTCTAT GGTAATTTTT AAACACTCCC 9180 

GATGATAACC TGTTGTATGT GCATGTGGGG AACGCACGGA AAACATCAGA ATCATCTGAA 92 4 0 

AAAAACAACG AACACACCAG AAAAACAGGA GCAACCATAA CGAAGCAACA TATTGATTTT 9300 

AAACAGAATT TAAGGTTAAC AGACAAAAAA CACTTTCAAC TGAAGGAGAA ATATACACTG 9360 

GCGACAGTGC AGGGTTTTTC ATGCAAAAAA AATGAGCTTT TATCTCCGGC GCATACTGAC 94 20 

CGGGATGCAG CCATGACAGA GCAAAAACCA TTAAATATCA GGAGGTTAAA CACACAAAAA 94 80 

GCTGACATGC ATCAGGGAGC AATCCCTCAC AACAGAGGCT GAGCGGCAAC GCTTCCTCAC 95 4 0 

AGGACGGCAT TCCTGAAAGG ACAGGCAGCC ACGGCTTTTT ACTGCCCGTA TCCGGTATAT 9600 

TTATCTGCCG TGACGTGCAG AGGATTTTGT GTTTCCGGAA ATCAGGAAAA CAGGAGAACC 9660 

GCGGGAGATA TGATGGAAAA AGAACCGGAT GATATCTGCG CAGACTGTCC GAATATTGAT 9720 

GCAATAAAAC GGCACAAACA ACAGGCCGGA GCCATCAGGG AATACACTGA GTGGTTAAAA 97 8 0 

AAACAACCGC GTGCTTCTTA CTTTTTTGTC TTCCGGTTGT AGGCATACCT TCAGAATGAA 98 4 0 

GTGATATCCC GAAAACAAAA ACATTCGGTC ACCAGCGATA ACAGCCATCC CCCGGAATCT 9900 

GATGTCACCC CTCCGGATTT AACCCTTCCC CGTCGCTACT ACTGTGATTA CGGTTACACG 9960 

CCCTACCCCA TGATGGGCGG ACAGATGTCT GTTTTTGCCA CAACGTCAGA AACCACCAGT 10020 
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TCGACGAATG 


CAGTCCCCGG 


AAACGCAGTT 


ACCGGGAATG 


AGACTGAAAA 


GCATGAAAAC 


10080 


GCGGTACCGG 


CGACATTCCC 


CGTCAGCCGT 


TCTGCAATGC 


CCCCGGAACC 


TCTGCGGTTT 


10140 


GCCACGGGTT 


TTCCATCGCA 


ACCACTGCTT 


GCCGGTCCCC 


GGGAAAAGCC 


GATGCGCACC 


10200 


GTGCATCCTG 


ACATCCACAG 


CGAAATTATA 


TGGTTCTGCT 


CCAGTTACCT 


GCTGAAATCC 


10260 


GGACCACAGA 


TTACGAAGAC 


GATTATCAAC 


TCAGTATTCT 


CTGAATGGGC 


CCGCATCAGG 


10320 


AATGATTACC 


CCTCCCCCTT 


TTGGTGGGTG 


GACAGCAGGG 


ACAGTGAACA 


GTGTGACTGG 


10380 


TTATGGAACG 


CCATGCAGCT 


CCGGTGTGTG 


GGAACCCCGC 


TGAATCCCCT 


TACCCCGGAG 


10440 


CAGAAATACT 


GGTTTGCCTG 


CGCCACGTTT 


GATAACTGGG 


AGGGCTGGAA 


TGAGCAACAG 


10500 


ATACAGTTTT 


TACTGAAAAG 


TAATCCCAGA 


CGAAACAGAG 


CGAAGTTTAC 


GGTCACCTTC 


10560 


GGCCCTCCCT 


GGATTCAGCA 


TAAAGCCATT 


CTTCTTGATG 


AGCTGAAGAG 


TGCCCGGGAG 


10620 


CAACAAAAAA 


GGCGCGATGA 


ACGCGCTGAT 


GGTTCCGTCC 


CGCTGAAACT 


GTCCGGAAAA 


10680 


ATCCACAAAC 


ACCTTGAAAG 


TATTGCCCGG 


AGTCGTGGTA 


TCCCCCCAAA 


AAAACTGCTG 


10740 


AATGAAATGA 


TTGAGCAGGC 


GTACCAGGAC 


TCAGTGGTGA 


ACAGCCGGAA 


TAAACCACTG 


10800 


ATTTAAAATA 


ATTTCAGACA 


GATATTATCT 


CCGTGAATCC 


CCCGCCACCT 


TTCCGGTGCG 


10860 


CGGGGTTTTG 


TCTTTTTTCA 


CCGGGAATAC 


ATGTATGAAT 


CCGTCTGATG 


CCATTGAGGC 


10920 


AATTGAAAAA 


CCGCTCTCCT 


CCCTGCCTTA 


CTCGCTTTCC 


CGTCACATCC 


TGGAACATCT 


10980 


GCGCAAACTC 


ACCCGTCACG 


AACCCGTGAT 


TGGCATTATG 


GGTAAAAGCG 


GGGCCGGTAA 


11040 


ATCCTCACTC 


TGTAATGCAC 


TGTTTCAGGG 


GGAGGTCACC 


CCGGTCAGTG 


ATGTTCACGC 


11100 


CGGCACCCGG 


GAAGTGCGGC 


GCTTCCGTCT 


GAGTGGCCAT 


GGTCACAACA 


TGGTTATCAC 


11160 


TGACCTGCCC 


GGGGTGGGCG 


AGAGCNGGGA 


CAGGGATGCA 


GAGTATGAAG 


CCCTGTACCG 


11220 


TGACATTCTG 


CCTGAACTGG 


ACCTGGTACT 


GTGGCTGATT 


AAAGCCGATG 


ACCGTGCCCT 


11280 


GTCTGTGGAT 


GAGTATTTCT 


GGCGACACAT 


CCTGCAACGC 


GGACATCAGC 


AGGTGCTGTT 


11340 


TGTGGTGACG 


CAGGCCGACA 


AAACGGAGCC 


CTGCCATGAA 


TGGGATATGG 


CCGGCATTCA 


11400 


GCCCTCTCCC 


GCACAGGCAC 


AGAACATTCG 


CGAAAAAACG 


GAGGCGGTAT 


TCCGTCTGTT 


11460 


CCGGCCTGTA 


CATCCGGTTG 


TGGCCGTATC 


GGCCCGCACC 


GGCTGGGAAC 


TGGATACGCT 


11520 


GGTCAGTGCA 


CTCATGACAG 


CGCTTCCCGA 


CCATGCCGCC 


AGTCCCCTGA 


TGACCCGACT 


11580 


GLAGGACGAb 


G I bLbLAL-bb 






CGTGAACAGT 


TTACCGGTGC 


11640 


GGTGGACCGG 


ATATTTGACA 


CAGCGGAGAG 


CGTCTGTGTT 


GCCTCTGTTG 


TCCGTACGGC 


11700 


CCTGCGCGCT 


GTTCGTGACA 


CCGTGGTCTC 


TGTTGCCCGC 


GCGGTATGGA 


ACTGGATCTT 


11760 


CTTCTGAACC 


TGTTGTGGAT 


GATGTCCTCC 


CTGCCTCTGA 


GTCTGCTCAC 


AAAAGCGCTG 


11820 


TTTTCGTTAC 


TGTCTCTCTT 


GTCCGTGCAA 


TAGCTCAATA 


ATAGAATAAA 


GCGATCGATA 


11880 


ACTATTTCAT 


CGATCGTTTA 


TATCGATCGA 


TATGCTAATA 


ATAACCTTTA 


TTACCAACAT 


11940 
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gcgcagata: gcacagacag 

TCAGCCGGAC CTCCGGCACT 
GCTTTTTTAT TCACCCTCAC 
TGCTACAGGC TGGTATGGAA 
CGCGCACGGG GTAAACGTGG 
CTCCCGGTGC TGGCTGCTGA 
CTGGTAAACC ATGACAACCA 
GGGCTTGAGC TGGGGCCGGA 
GGCACAGGCA GAAACACCAC 
ACTGCCAGTG ATACGGTTAT 
AACACCACGC TGGATAACAG 
ACAATTATTA ACCAGGATGG 
ATCGTCAACA CCGGTGCAGA 
GTCGGAGGGA CGGCTGAATC 
TCGGGGATGG CACGGGACAC 
GCACATAACA CCCGACTGGA 
GAGACGCTGA TAAACCGTGA 
ACCACCATCA ACCAGAAAGG 
ACCCAGAACA CGGGCGGAGC 
CGCCTGGGAG CATTCTCTGT 
GGCCGTCTGG ATGTGCTGAC 
ACGCTGGATG TCCGCAACGG 
CTGCTGGCCG ATTCCGGTGC 
ATCGGGGGCG GTCAGGCGGA 
GCCGGTGATA CGGCCACGGA 
ACGCTGGCGG GCACCACCAC 
GTGAATAACG ATACCCTGAC 
ACCGGTAACG GCAGGGTGGA 
CTCACCCAGA AAACCGTCAA 
ACCACGGATA TCATCGCTCA 
AACGGTGCCA TTGACCCCAC 
GATAACGCCC CGGTTCAGTC 
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-178- 

ACATTCAGGG GACGACAGAA CAACACTTCA GAAACTCCGG 12000 

GTAACCCTTT ACCTGCCGGT AT2CACATCT GTGGATACCG 12060 

TCTGATTAAG GAAAT GCTGA TGAAACGACA TCTGAATACC 12120 

TCACATTACG GGCGCTTTCG TGGTTGCCTC CGAACTGGCC 12180 

GGGTGTGGCG GTTGCACTGT CTCTTGCCGC GGTCACGTCA 1224 0 

GAT 2GTTGTG CACCCGGGTG AAACAGTGAA TGGCGGAACA 12300 

GTTTGTATCC GGAACAGCTG ATGGCGTGAC TGTCAGTACC 12360 

GAGTGAGGAA AACACCGGCG GGCAATGGAT AAAAGCGGGT 124 2 0 

TGTGACCGCA AATGGTCGTC AGATTGTGCA GGCAGGAGGA 124 80 

TCGTGATGGC GGAGGGCAGA GCCTTAACGG ACTGGCGGTG 1254 0 

AGGTGAGCAG TGGGTAGACG GGGGAGGGAA AGCAGACGGT 12 600 

TTACCAGACC ATAAAAGATG GCGGACTGGC AACCGGAACC 12 660 

AGGTGGTCCG GAGTGTGAAA ATGTGTCCAG CGGTCAGATG 12720 

CACCACCATC AAGAAAAATG GGCGGCAGGT TATGTGGTCT 12780 

CCTCATTTGC GCTGGTGGTG ACCAGACGGT ACACGGAGAG 128 4 0 

GGGAGGTAAC CAGTATGTAC ACAACGGTGG CACGGCAACA 12 900 

TGGCTGGCAG GTGATTAAGG AAGGAGGAAC TGCCGCGCAT 12 960 

AAAGCTGCAG GTGAATGCCG GCGGTAAAGC GTCTGATGTC 13020 

ACTGGTTACC AGCACTGCTG CAACCGTCAC CGGCACAAAC 13080 

TGTGGAGGGT AAAGCTGATA ATGTCGTACT GGAAAATGGC 13140 

CGGACACACA GCCAGCAGAA CCCGTGTGGA TGATGGCGGA 13200 

TGGCACCGCC ACCACCGTAT CCATGGGGGA TGGCGGTATA 132 60 

CGCTGTCAGT GGTACCCGGA GCGACGGAAC GGCATTCCGT 13320 

TGCCCTGATG CTGGGAAAAG GCAGTTCATT CACGCTGAAC 13380 

TACCACGGTA AATGGCGGAC TGTTCACGGC CAGAGGGGGC 13440 

ACTGAATAAC GGTGCCACGG TTACCCTTTC CGGGAAAACG 13500 

CATCCGTGAA GGTGATGCAG TGCTGCAGGG AGGCGCTCTT 13560 

AAAATCAGGA AGTGGCACAC TCACTGTCAG CAACACCACA 13 620 

CCTGAATGAA GGCAGGCTGA CGCTGAACGA CAGTACCGTC 13 680 

TCGCGGCACG GCCCTGAAGC TGACCGGCAG CACCGTGCTG 13740 

GAATGTCACC CTCGCCTCCG GTGCCATCTG GAATATCCCC 13800 

AGTAGTGGAT GACCTCAGCC ATGCCGGACA GATTGATTTC 13860 



WO 98/22575 

-179- 

ACCTCCGCCC GCACAGGGAA GTTCGTACCG GCAACTCTGC 
CAGAATGGCA CCATCAGCCT GCGTGTACGC CCGGATATGG 
CTGGTCATTG ACGGTGGCAG GGCAACCGGA AAAACCATCC 
AACAGTGCGT CGGGGCTGGC GACCACCGGT AAGGGGATTC 
GGTGCCACCA CGGAGGAAGG GGCCTTTGTC CAGGGGAATA 
AACTACACCC TCAACCGGGA GAGTGATGAG AGCTGGTATG 
CGTGCTGAAG TCCCCCTGTA TGGCTCCATG CTGAGACAGG 
CTGGCAGGCT CCCGCAGCCA TCAGACCGGT GTAAGGGGTG 
AGCATTCAGG GCGGTCATCT CGGGCACGAT AACAACGGTG 
CCGGAAAGCA GCGGCAGCTA TGGCTTCGTC CGTCTGGAGG 
GTTGCGGGTA TGTCTGTGAC CGCGGGGGTA TATGGTGCTG 
GTTAAGGATT ATGACGGTTC CCGCGCCGGC ACGGTGCGGG 
GGATACCTGA ATCTGGTACA CACCTCCTCC GGCCTGTGG G 
ACCCGCCACA GTATGAAAGC GTCATCGGAG AATAACGACT 
TGGCTGGGCT CACTGGAAAC CGGTCTGCCC TTCAGTATCA 
CCACGACTGC AGTACACCTG GCAGGGGCTG TCCCTGGATG 
TATGTGAAGT TCGGGCATGG CAGTGCACAA CATGTGCGTG 
CACAACGATA TGACCTTTGG TGAAGGCACC TCATGCGGTG 
AAACACAGTG TGCGTGAACT GCCGGTGAAC GGGTGGGTAC 
TTCAGCTCCC GGGGAGACAT GAGCATGGGT ACAGGGGCAG 
TCACCGTCCC GGAATGGCAC GTCACTGGAG CTGCAGGCCG 
GAAAATATCA CCCTGGGCGT TCAGGCCGGT TATGCCCAGA 
GAAGGTTATA ACGGCCAAGC CACACTGAAT GTGACCTTGT 
CTGTGGTCCC GGTCATCATG ACCGGGACCC GGACAGGTGC 
CACTGGCATT CACAATAACA TGATATTGAT CAGGGAGTGA 
GCGCTGATTC TGCTGATCGC AGGATTTGGG ATTCTTGGGG 
AGCACAGCGT CTGCGCTGGC AGGGTTCATA TTGCTGTGTG 
GCTGGCTATA TCACTGAACG CATAACCCGG TTATTCCATA 
TTTCTGAGGA TTGCCGGAAT GGTCATCAGC TTCATGTGGG 
GCACTGGAGG CTCATACCTT TGACTCTGTA AAATTTATTC 
GGTCTGGTGG CTCTTCCCGT GCAGATAAGA ACCATTCAGC 
GATATGAGCA AGGAAATTAA CGGGTATTAC TGCTGTTTTT 
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AGGTGAAAAA GCTGAACGGA 13 920 

CGCAGAACAA TGCTGACAGA 13980 

TGAATCTGGT GAACGCCGGC 14 040 

AGGTGGTTGA AGCCATTAAC 14 100 

TGCTGCAGGC GGGGGCCTTT 14160 

TGGGGAGTGA AGAACGTTAT 14 220 

CAATGGACTA TGACCGGATT 14 280 

AAAATAACAG GGTGGGTCTC 14 340 

GTATTGCCCG TGGGGCCACG 14 4 00 

GTGACCTGCT CAGAACAGAG 144 60 

CTGGGGATTC TTCGGTTGAT 14 520 

ATGATGCCGG CAGCCTGGGC 14 580 

CTGACATTGT GGCACAGGGA 14 640 

TCCGCGCACG GGGCCGGGGC 14700 

CTGACAATCT GATGGTGGAG 14 760 

ACGGTAAGGA CAACGCCGGT 14 820 

CCGGTTTCCG TCTGGGCAGC 14880 

ACAGCCTGCG TGAGAGTGCA 14 940 

AGCGTTCTGT TATGCGCACC 15000 

CCGGCAGTAA CATGACGTTC 15060 

GACTGGAAGC CCGTGTCCGG 1512 0 

GCGTCAGCGG CAGCAGCGCT 15180 

GATAATTCGG CATTGTCTCT 15240 

AAACGCTTCA GTGCCACATT 15300 

CTATGTTACA GATAGTCGGT 15360 

TTTTGTTCAG AGCATTAACC 15420 

TGTTCGGCGC GGCTTTACTG 15480 

TTCGCTGGCT GGCAGGCGTA 1554 0 

GACTTGATGG TAAACATATG 15 600 

TGAGCACCGC TCTC 3CCGCT 15660 

AGAACGGGCT CACACCAGAA 15720 

ATACTGCTTT TTTCCTTATG 15780 
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GCGTGTTCTG CATACGCACC ATT GATCGCA TTGCAGTTCG ATATTTCACC CTCACTGATG 15840 

TGGTGGGGCG GGTTGTTGTA CTGGCTGGCT GCATTAGTGA CGCTGCTATG GGCG 3CCAGC 15900 

CAGATCCAGG CGC TGAAAAA ACTGACCAGT GCCATCAGCC AGACACTGGA AGAACAACCG 15960 

GTGCTCAACA GTAAATCGTG GCTGACCAGT TTGCAAAACG ATTACAGCCT TCCTGACTCA 1602 0 

CTGACGGAGC GCATCTGGCT CACGCTCATT TCACAACGGA TTTCCCGGGG AGAACTGAGG 16080 

GAATTTGAAC TGGCAGACGG AAACTGGCTA CTGGACAATG CCTGGTATGA AAGAAACATG 1614 0 

GCGGGTTTCA ACGAAAAGCT GAGAGAGAGC CTGTCATTTA CCCCTGATGA ACTGAAAACC 16200 

CTCTTCCGGA ACCGCCTGAA TTTATCACCG GAAGCGAATG ACGATTTTCT CGATCGTTGC 162 60 

CTGGACGGCG GTGACTGGTA CCCCTTTTCA GAAGGCCGCC GTTTTGTATC ATTCCACCAC 1632 0 

GTGGATGAGC TTCGTATCTG TGCCTCCTGC GGGCTGACAG AAGTACATCA TGCCCGGGAA 1638 0 

AATCATAAGC CGGATCCGGA ATGGTACTGC TCCTCTCTTT GTCGCGAAAC AGAAACACTG 16440 

TGTCAGGACA TTTATGAACG TTCTTACACC GGTTTTATTT CCGATGCAAC GGCGAATGGT 16500 

CTGATTCTCA TGAAACTGGG GGAAACCTGG AGTACAAATG AGAAAATGTT TGCTTCCGGA 165 60 

GGGCAGGGAC ATGGGTTTGC CGCTGAACGG GGAAACCATA TTGTCGACAG AGTCCGTCTG 16620 

AAAAACGCAC GGATCCTCGG TGATAATAAT GCGAAAAATG GAGCAGACAG ACTGGTCAGC 16 680 

GGAACAGAAA TCCAGACGAA ATATTGTTCA ACTGCAGCCC GTAGCGTCGG TGCGGCATTC 16740 

GACGGACAGA ACGGACAGTA TCGTTACATG GGAAATCATG GTCCCATGCA ACTGGAAGTC 16800 

CCCGTGATCA GTATGCCGGC GCTGTGGAAA CCATGAAGAA TAAGATCCGC GAAGGTAAAG 168 60 

TACCCGGTGT AACCGATCCC GAAGAAGGGT CCCGGCTGAT TCGTCGGGGA CATCTGACTT 16920 

ATACCCAGGC CCGTAATATC ACCCGGTTCG GGACCATCGA ATGGGTCACT TATGATATTG 16980 

CCGAGGGGTC GGTTGTCAGT CTGGCGGCCG GAGGGATCAG TTTTGCCCTG ACGGCATCGG 1704 0 

TCTTCTGGCT CAGCACCGGC GATCGCGATG CTGCCCTGCA GACAGCTGCT GTCCAGGCAG 17100 

GAAAAACCTT CACCCGCACA CTGGCTGTCT ACGTCACAAC CCAGCAACTT CACCGGCTCA 17160 

GTGTTGTTCA GGGTATGCTG AAGCATATTG ATTTTTCGAC GGCCAGCCCG ACTGTCCGGC 17220 

AGGCGCTTCA GAAGGGGACC GGTGCAGGAA ATATCAGTGC CCTGAACAAA GTGATGAAGG 17 280 

GGTCGCTGGT GACATCTCTG GCACTGGTAG CTGTCACAAC CGGCCCTGAC ATGATCAAAA 1734 0 

TGTTGCGGGG ACGGATCTCC GGTGCGCAGT TCATCAGGAA TCTTGCCGTG GCATCTTCCT 17400 

GTGTGGCAGG TGGTGCTGTC GGGTCAGTGG CGGGCGGGAT ATTGTTCAGT CCACTGGGAC 174 60 

CATTTGGTGC ACTGACAGGG CGTGTGGTTG GCGGTGTTCT GGGGGGAATG ATTGCCTCCG 17 520 

CTGTATCAGG AAAAATTGCC GGAGCGCTGG TTGAAGAAGA TCGCGTCAAA ATTCTGGCAA 17580 

TGATTCAGGA GCAGGTGACA TGGCTTGCCG GCAGTTTCCT GCTGACCGGA CATGAGATTG 1764 0 

AAAATCTGAA CGCGAATCTG GCCCGTGTTA TCGATCAGAA TGGTNCTGGA GATCATTTTC 17700 
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GCCGCCGGTA 3 7710 

{2} INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1803 base pairs 

(B) TYPE: nucleic scid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 71: 

AATAACCAAT AGATGCTTAA GTTTACGATA TGCCTCAACC CGCGTCTGCT CTAAGCTGAT 60 

AAGGCCAGTT TTGTAGAGAT CCGCTGCCAA GGTTGCCTGC GTTTGCACAT CCATGTAACC 120 

GGCGGTGATT TCATTCATGG CATCGTTATC TTGACCAGTC AGCTTAGCAC GCTCCTGTTC 180 

AAGCTGCTTG GTTAGGGCGT CAACTCGGCT CTGTAATGAG ACTACGGCCG GTGCGGTTTC 24 0 

CTTCATATAG CTGCGCAGTT GTTTTAGCTC CCCCTGTTGA CGCACCAGCT CTCCTTCAAT 3 00 

CTGGCTGACC ACTCCCAAGC GTGCGCTGCT GGTAGATTCA GGGCTGAGAA GTTGGTGGCT 360 

ATTCTGAAAT GCTAATACTT TAGCTTTTTC ATCCTGTAAG CGTTGATATG CTCTATTTAC 4 20 

TTCTTTTTCA ACAAAGGCCA ATTGTTCGAG CGCAACCTGA TGACCTAATT TGTTAATAAA 4 80 

ACGCTCCGAT TCTTTGAGCA TTAACTCAAC AACTCGCTGA CCGTATTGGG GATCAAATGT 54 0 

CTGCAACTCA ACGGTAAGTA CTCCTGATAA TTCATCAAGG TGTAACGTCA AATGTTTGCG 600 

GTAATAATCA AGAAAATCTT CCCTACTGAC TCCCTTATGC AACCGCGAGA AATAATCTGC 660 

ACTATCACTC TGGAAATGTG CTTTAAGTGC AAGTTCTTTG TCCAACTTGG CCAGCATATC 720 

CCATGACTTC ATATAATCCT GAACGAGTAA TATATCCTGA TGATTACTAC CACCTATCCC 7 80 

TAACATTGAT AACGCATCAG GCAACATTTT AACTTGATCG GCTTGTTTAA TCATTAATTC 84 0 

AGCCCGGSTC ACATAACGAT CGGAAGCAAT GAAGCCAAAA TAGAGCACTG CGATAGAAAA 900 

GCAGATAACT ACCCAAAGAA AACTGCCTAG CTGTAAACTT TTCTTCCACG AGCGGTGTAC 960 

AATTTGATAT CCTCTCGAAT CAATCAAAAA TAGTTTTGGA TTATTGCTCA GTTTTCTTAA 1020 

CTTTCGCGTA AGGCGAGATA TTGAGGATGA AGAATTCGGA GATGTCATAA TCAGTTGCTG 1080 

CTCAAAGTGA CTGGTAAATT TTGATGGCAT CATCAATATT ATCAAAAACT TCTAATTTAC 114 0 

CATCACGTAA CAAGATGCCC ATATCGCATT GTTGTCGTAG ATTTTTCATA TCATGCGAAA 1200 

CCATAATCAA ACTAGCTGTT TCTCGCTTTT TGTTAAATAC ATCAATACAT TTTTGTTTAA 1260 

AACGTGCATC ACCTACTGAG GTAATTTCAT CGGTAAGATA TATATCAAAA TCAAAAGCCA 132 0 

TACTAACAGC AAAAGAAAAT TTTGATTTCA TGCCGCTAGA GTATGTTTTA ATAGGCAGCT 138 0 

CATAATGTTG TCCAATTTCA GAAAACTCTT TAACCCACTC TTCTACGGGG CTTGTATCGC 14 4 0 

GTACACCATG AATGCGGCAA ACAAATCGCG TGTTTTCACG ACCAGTCATA CTACCTTGAA 1500 
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ATCCCCCAGC TAGTGCTAGA GGCCAAGATA CTCGGCAGAG AC GAG T TACT TTCCCCCTGT 1560 

TAGGCGTATC CATCCCTCCT AA C AAA C G T A A C AAAG T A G A TTT YCCKGCT CCATKGATAC 1620 

CTAGAATACC TATATTACGG TCCCTTGGTA GCTCAATATT TACATTCCTC AGGACATAAT 168 0 

TTCGTCCAAA TTTAGTTGGA TAATATTTTG A T AC AT TAT C AAGAATAATC ATTTTTCTTA 174 0 

ACGCTAACTA GCAATCAATT GGCGATGCCG TAATCGGTAA CAACTCATAG CAAAAGTGAG 1800 

CAA 1803 



(2) INFORMATION FOR SEQ ID NO: 72: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1283 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 

NGGACCCAAG GTAAAAACNG GTAAAAAAAA CMATTGACCG ATTAAACTTT ATTTCTCTGC 60 

CCGCATTAGT CTGGAGAGAG GATGGATGTC ATTTTAATTT NACTAAAGTC AGTAAAGAAG 120 

CAAACAGATA TCTTATTTTT GATCTGGAGC AGCGAAATCC CCGTGTTCTC GAACAGTCTG 180 

AGTTTGAGGC GTTATATCAG GGGCATATTA TTCTTATTGC TTCCCGTTCT TCTGTTACCG 24 0 

GGAAACTGGC AAAATTTGAC TTTACCTGGT TTATTCCTGC CATTATAAAA TACAGGAAAA 300 

TATTTATTGA AACCCTTGTT GTATCTGTTT TTTTACAATT ATTTGCATTA ATAACCCCCC 360 

TTTTTTTTCA GGTGGTTATG GACAAAGTAT TAGTACACAG GGGGTTTTCA ACCCTTAATG 4 20 

TTATTACTGT CGCATTATCT GTTGTGGTGG TGTTTGAGAT TATACTCAGC GGTTTAAGAA 4 80 

CTTACATTTT TGCACATAGT ACAAGTCGGA TTGATGTTGA GTTGGGTGCC AAACTCTTCC 54 0 

GGCATTTACT GGCGCTACCG ATCTCTTATT TTGAGAGTCG TCGTGTTGGT GATACTGTTG 600 

CCAGGGTAAG AGAATTAGAC CAGATCCGTA ATTTCCTGAC AGGACAGGCA TTAACATCTG 660 

TTCTGGACTT ATTATTTTCA TTCATATTTT TTGCGGTAAT GTGGTATTAC AGCCCAAAGC 720 

TTACTCTGGT GATCTTATTT TCGCTGCCCT GTTATGCTGC ATGGTCTGTT TTTATTAGCC 780 

CCATTTTGCG ACGTCGCCTT GATGATAAGT TTTCACGGAA TGCGGATAAT CAATCTTTCC 84 0 

TGGTGGAATC AGTCACGGCG ATTAACACTA TAAAAGCTAT GGCAGTCTCA CCTCAGATGA 900 

CGAACATATG GGACAAACAA TTGGCAGGAT ATGTTGCTGC AGGCTTTAAA GTGACAGTAT 960 

TAGCCACCAT TGGTCAACAA GGAATACAGT TAATACAAAA GACTGTTATG ATCATCAACC 102 0 

TGTGGGTTGG GGTGCACACC TGGTTATTTC CGGGGATTTA AGTATTGGTC AGTTAATTGC 1080 

TTTTAATATG CTTGCAGGTC AGATTGTTGC ACCGGTTATT CGCCTTGCAC AAATCTGGCA 114 0 

GGATTTCCA-3 CAGGTTGGTA TATCAGTTAC CCGCCTTGGT GATGTGCTTA ACTCTCCAAC 1200 
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TGAARTTCAT CATGGGAAAC TGGSATTACC GGRAATTAAW GGTGATATCA CTTTTCGTAA 12 60 

TATCCGGTTT CGCTATAAGC CTG 1283 
(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6836 base pairs 

( B ) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 



TCAACCTGAC 


CAACCACTAG 


AATCAACTCA 


CGTCCGTCGT 


TAGGGGGCTC 


ATATTCTTGT 


60 


GTACTCCCCA 


CATTGTATTT 


ACTGACTCGT 


GATGATTGTA 


ATTGCGCTAA 


TAATGACTCT 


120 


GCGCGTGCTT 


CTTCTTTCGC 


ATCTAAAACG 


TACGTAGTGA 


GTAACTGCTC 


AAGCTTACTC 


180 


GGACGGCGGC 


TATCAAAATA 


GATTCCAACG 


GGGTCAATCG 


AGAGTGATGA 


AGGTCGACAT 


240 


AAATTAGACC 


CCAATCCGTT 


GGAGCGGATA 


AAACCATCTT 


CAATCCGGAT 


CACTGATTGC 


300 


AGTTCAGGAT 


AACGGTTTCC 


CCACACCAAC 


ACCTGTTCAT 


CATCTTTTAA 


CTGTGAGGGC 


360 


ACAGTACGAA 


CAAAACAAAG 


TTCATCTGCC 


AAATACGCAC 


AAAATGTGCG 


TATAAAAGCA 


420 


CGCTTCCACA 


GAGAAAAACC 


AACGAGATAA 


AGACGACGCC 


AAGGTTTGGG 


CTCTACCTGC 


480 


TGCTGAGCCA 


AAATCGCTAC 


AACATCTTCT 


ACCTCACAAC 


GTTTTCCCAA 


TATAGGATCT 


540 


AAATAACGCG 


GATAACGGAT 


CAACGCCGCC 


GCAACTAAGC 


GGGGCAATGA 


AATAGATGAA 


600 


MLbLL. 1 1 L-oo 
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CGTATACAAC 


GTTTACTGTC 


ATGCGTTAAC 


660 


CCCCACCCAG 


CATAAAATGG 


CATACCGAAG 


CAATATACAG 


GTTTGCCCAA 


CAGCAACGCT 


720 


TCCAAAGCCA 


ACCTGCGATG 


AAACTGTGTA 


CACCGCATCC 


ACCATACGAA 


TTATTCTATG 


780 


CGGATGGCAA 


GTTCACTCAC 


CACCTCAACA 


TCAGCCAGTC 


GAGGATCACG 


CCCCACTAAA 


840 


CGTGCTAACA 


CGCCGCTTTT 


TTTGCTAAAG 


CGTGTATCTG 


GGTGTGTTCG 


CAACAATAGA 


900 


CGCGCATTAG 


GGTGATTACG 


GCGAGCCTCG 


ACCACCATAG 


AAACAAAATC 


AGCTTCGCAA 


960 


GCAAGAGCCC 


CAGAAATTGA 


CAAGTCTCCC 


GCTACTTGAT 


CCACAAGCAA 


AATACGCGGT 


1020 


CTTGGATCAT 


CCAGTAAACG 


TGCTAAGTTT 


GAATGAGCCG 


TGAGGTGAAT 


AACTCAGGTT 


1080 


GTATATGTGT 


CGGTAAATCT 


AAAGAAGGCC 


CGTCAGTAGC 


ACGGGACAGA 


GCCATTAAAT 


1140 


GTATGCTCAG 


TGCTATTGGG 


TATAGCAGTT 


ATACTTGGTG 


ATTCCTAAAC 


GCAAAATATC 


1200 


MGAGATCAGA 


TGCTCCAGCG 


CGCGCAAAGT 


AAAGCCGTAT 


CCAACAGGTT 


CCAATAATAA 


1260 


GCTGTTCTAA 


TTGACTCGTC 


TGATGTGCAT 


CATAATATAT 


CCCCAGAGGG 


TCAGCAATAA 


1320 


GAGAAACCGC 


CTTTCCTCCT 


TTTGCTGGGT 


GCCCGATATA 


GCCAATAAAA 


CCATCTTCAA 


1380 


GTTGCCAATA 


AGATATTCCT 


AACTCTTGAG 


CTTTCTGTTT 


AATCTGCTTA 


GTATTAGATT 


1440 
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TTTTTCCCCA GCCAACTAAA ACG7CATTTT TAGAAAAAGC CTCGTCTCCT TTCATATAAA 1500 

GCAATGGGTG AGCAAGCATA GGCTCAATAT TATTTTYTCT GGCAAGAATC CCTTTCGATC 15 60 

CCGTATATAA ATACATGTTG TCTCTGTGAA CTGAAGATTC TCTACAATGG TGTATAAAGT 1620 

GTGATTTAGA TGAACAGCTC TGCGCTCTCT AATGACTTTG CAATACTATC TTTTGCTGAA 1680 

GTGAGAATGT CCGCCTTTAA CTCGGGCCAC CTAATACCAA TTGTAGGATC ATTCCATGCA 17 40 

ATGCCTCTAT CACTGGCAGG GGCATAATAA TTAGTTGTTT TATACAAAAA TTCGGCCGAT 1800 

TCAGTCAGTG TTACAAAACC ATGGGCAAAT CCTTCCGGAA TCCATAATGT CGTTTGTTTT 18 60 

CCCCTGAAAG ATGAACGCCA ACCCATTGTC CGRAGCTCGG TGAGCTTTTG CGAATATCTA 1920 

CCGCAACATC AAACACTTCA CCGGCTACAC AACGCACTAA CTTGCCCTGG GCATGGGGAG 198 0 

GTAACTGATA GTGCAAGCCA CGGAGTACCC CTTTAGAAGA TTTTGAGTGA TTATCCTGCA 2 04 0 

CAAAGGTAAC TGGATATCCT ACAGCCTCTT CAAACAACTT GTGATTAAAA CTCTCAAAGA 2100 

AAAAACC AC G CTCATCTCCA AATACTTTTG GCTCAAAAAT AAGCACACCA GGAATTGCTG 2160 

TCTTGATTAC ATTCATCTAT ATGCCCACAT TTAATTAAAT ATTTTTAGGG GAAGCATATT 2220 

CCCTCCCCCT TCTCAATTAC ATCACGCCTT ATCAATCATT TTTAATAAAT ATTGCCCATA 2280 

GGCGTTTTTT GCCAACGGAG CAGCAAGYTC ACGAACCTGG TCGGCACTAA TAAACTTCTG 2 34 0 

GCGATAAGCA ATCTCTTCCG GACAAGCCAC TTTCAATCCC TGACGCGTCT CGATGGTCTG 2 4 00 

AATAAAGTTA CTCGCTTCAA TTAGGCTTTC GTGGGTACCG GTATCAAGCC AGGCATAACC 2 4 60 

ACGCCCCATC ATTGCCACCG ATAGATTGCC TTGCTCCAGG TAAATACGGT TCACATCGGT 2 520 

GATTTCCAAC TCACCACGCG GCGATGGCTT GAGACCCTTG GCAACGTCCA CAACGCTGTT 2580 

GTCGTAGAAA TAGAGGCCGG TGACTGCGTA STACTCTTAG GCTCCAGTGG TTTTTCTTCC 2 64 0 

AGTGAAATAG CGGTACCTTG AT TAT C AAA T TCGACCACTC CATAACGTTC CGGGTCGTGC 2700 

ACATGATAGG CAAATACAGT AGCACCGGTC TCTTTGGCCG CGGCTGCCTC CAACTGTTTC 2 7 60 

TGTAGGTCAT GACCGTAGAA GATGTTATCC CCCAGCACCA GTGCACACGG GGCTGAACCA 2820 

ATGAATTCTT CACCTAGAAT AAAAGCTTGT GCCAACCCGT CTGGGCTTGG CTGAACCTCA 2 88 0 

TATTGTAAAT TCAGTCCCCA GTGGCTGCCA TCACCCAGCA ATCGCTGAAA GGANGGAGTA 2 94 0 

TCTTGTGGAG TGCTAATGAT CAAAATATCG CGAATTCCAG CCAGCATCAG GGTGCTCAGC 3000 

GGCCGCAGTA CTGGATCATC GGCTTGTCAT AGATGGGCAA CAACTGCTTG CTCACCGCCA 3060 

TAGTAACCGG ATAGAGACGT GTACCAGATC CACCGGCCAG AATAATACCT TTACGTTTAG 3120 

TCATGATGCT TGTTTCTTAT TTTTAAATTA CATAAGAATA AAGTGGCTTG AGCCGCGCCT 3180 

TTCTGTTTTA TCCTCACCTG TGGTTTACTT CCCCATGATC TCAGTCAACA TCCGCTCAAC 32 4 0 

ACCGACTGAC CAGTCCGGCA AAACCAGATC AAATGTACGC TGGAATTTTT TAGTATCAAG 3300 

TCGGGAATTA TGAGGGCGTT TCG 2CGGGGT CGGAAAGGCG CCTGTCGGCA CTGCATTAAG 33 60 
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CTGTGTGACT GCCAGTTCAA CTCCTGCGTC TCTGGCTTTG TCAAACACCA ACCGGGCGTA 34 20 

GTCAAACCAA GTGGTAGTAG CGGAGGCAGC CAAATGGTAC AGCCCGGCAA CGTCGGGTTT 34 80 

GCTCTGTGCA ACTCGGATTG CATGGGCGGT ACAATCGGCC AGCAACTCAG CTCCAGTTGG 354 0 

AGCGCCAAAC TGATCATTAA TGACCGATAT CTCGCGACGC TGTTTGCCAA GACGCAGCAT 3600 

AGTTTTGGCG AAGTTGGCAC CGCGCGCAGC ATAAACGCAA CTGGTACGAA AGATAAGGTG 36 60 

ACGTGAGCAG AGTGCCGCAC CGTGTTCCCC TGCCAGCTTG GTTTCGCCAT AGACGTTGAG 3720 

CGGGGAAATC ACATCGGTTT CCACC CAAGG ACGTTCACCA CTTCCATCGA AAACATAGTC 37 8 0 

GGTGGAATAA TGTACTAGCC ACGCACCTAA TGCTTCAGCT TCTTTGGCAA TAACGGCCAC 38 4 0 

ACTAGTTGCA TTGAGTAACT CGGCAAATTC CCGCTCACTC TCCGCTTTGT CGACTGCAGT 3900 

ATGGGCCGCT GCGTTAACAA TCACATCGGG CTTGACGAGA CGTACCGTTT CAGCCACCCC 3960 

TGCAGAATTG GTAAAATCAC CGCAATAGTC GGTGGAGTCA AAATCAACGG CAGTGATGTG 4 02 0 

CCCCAGAGGC GCCAATGCAC GCTGCAGCCC CCATCCACTT TCTGGCCACA CCAGACTCGC 4 08 0 

CAGCAAAAAA GTGAGTGCTG TCAATAACTG AACCAGCGGA TAACGCTTGC TGATTTTCGC 414 0 

CTGACAGTCG CGGCAGCGCC CTTTGAGGAT CAACCATGAG AGCAGCGGAA TATTGTCACG 4 200 

AACGCGGATG GTCTGCTGGC AATGCGGACA GTGCGAACGC GGTAGCGCAA GGCTTATTTT 4 2 60 

TGACTGCGCA CTCGGCATTT CACCATGAAA CTCCGCCATT TGTTGGCGCA GCATGATGGG 4320 

GTAACGCCAA ATCAGCACAT TGAAAAAACT GCCGATGATG AATCCTCCGA CGGTTGCCAG 4380 

TATGGGCATC GCCGGGGGGT ATTGCTGAAA AACATCAAAA AGCATGGTTA AAGGTTATTT 44 4 0 

GTTGTAACTT GCCGGATGCG GGCCTGCGGG TGTATGCCAT ACGGGTTTCC TTCAGGCCCG 4 500 

ATGCGCCTTA TTTGATGCCG GATGCGGCGC GAGCGCCTTA TCCGGCATAC AGGGTTACTC 4 560 

AGCTGACATC TTATGCTCGG TAACCTGATT AATGGTTTCC GGCCCTTGCT GCGGTTTCGG 4 620 

CAGATTAAGC GCCGCCAGTG TCTCGTAAGG CGACTGGCTC ACACCGCCCT CGAAGTTCAT 4 68 0 

CTCGCTCGCT CCCGGCAACT GGTAAGCATT CGCGCCCGGA TTCCATTTCT TAAAGAACTC 4 74 0 

CGAAAGATCC GTCTGGGCGA CCCAGGATGC ACACAGCATG AGCTTGTCGG CAGCGTTACC 4 800 

GTTGGATTCG GCACAGTAAT TTCTTTCGGC AAACTTGGTT TTGCCAACCT CATCGCCGCG 4 8 60 

TGCTTTACGG TGCATCAACT GGAACAGGTT CCAGCCTTTC ATCCCTTCAC GATCGCTGTA 4 920 

GAACTTAGGG AGGTCACCTT CTGGATACCA CTGTTTGATA TCAAAGTTTT TCTCTGCCCA 4 98 0 

CTCTTTCAGC TGTGCGTAGA TCAGCAGACG GTCACCCGCA CCGCCGCGCG CCCATGCCTG 504 0 

ACCGTTGCTC TCCTCCAGAT ATTCGGGCGC GACGGTAATG TCGTCAGCGA CACGGTTCAT 5100 

CTTGCCGAGA TAGCGATCCT GCATGTACAG CGCCAGCACG TTGTTCGCTA CTTCAGTTGC 5160 

GCCAGGAACA GTCAGCGGCG TTTCGGCGGC GTTGTGACCA ACTTCGTGCC AGATCAGCCA 5220 

GTCGTTCAGC GGCGTCGTCG GCAGCGTGGT GCTGTTCGTC GAGAAGCTGG TGTTCATTAC 5280 
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CGGATAACCA GAGTGCGCAT CACCGATGGA GATCTGCACA TCGTTGGTGA AACGATGCTT 534 0 

GTGGCCCGTC AAGTTTTTAT AGGTAAACAT CCGGTG CTTA CCGTCTTCAT CATTACGACC 54 00 

GTAGAAGTCA TTCATCGAGC TGGCAAAGGT ATCCAGATCT TTAGCGAATT CTGCTACGCC 54 60 

ACCAGTGAAA TTGCTGGCCT CAAGGTTCTT CTTCGGCGTG GTGTAGACGA AAGCGTCTGA 5520 

CTCCAGCTCG CCCAACGGCG CAGGGGAGTT CAGAGCGTTT TTCCATGCGC CATCTTTATA 5580 

GAACGGCGCT TTCACCACAC CAGTAAAGGT GAATTCGGCT GACTCATTCT GTGGGCTGTT 564 0 

GCCCTTGATA TAAATCAGAC CACCGTAAGG AACCGTAAAC TTCACCTCAC CATTGGCTTT 5 7 00 

CAGCTCATAG GTTTTCGTCA CTTTTGGCGG AGGGTTCAGA GCGACTTCAT GCTTCTCAGG 57 60 

TCCGGTAAGG TCGTCGGCCA GCGCCACGGT GACAGTCACA GGAACTGATG CAGAAGACTC 5820 

AATGGTGACC TCTTTCTGAG CCGGAGCCCA CAGGCCAGTA GACTGCATGT TACCCGCAAA 58 80 

CCATTTGGTC GGATTCGAGT ACAGGCTGAT GGTTTCAGTA ACCTTCTCAC CTTCTGCCGA 5 94 0 

TACCGCTCCC GGATAGTTCT CGACATCAAC TTTGATGTTC AGATCCCACC AGGAACGACG 6000 

CAGCATCAGG CGCGTCAGCG GTTTTTCCAT ATAGTTGAGC GGATAGCTCG GGTTCATCAT 60 60 

GCCCGCTTTA TTAACGCTCT TCTCGCCGTA GATCATGTTG TTATCGACCA GCGATTTTTT 6120 

CAGCTCATCA GAAACACTGC GTGGCGCCAG TATAGGCATC GTTGGCGTAG CAGTTCAGGA 618 0 

ACTCGGTGAA CGTTTTAAAG CCCAGCTCGT CATCCTTGTC GTTTTCATAG CGATATTCAA 624 0 

TTTTATTCCA CAGCCAGACC GACATGTTCT GGTACAGACG TTCCAGATCG ACGCTGCTCA 6300 

GACGCTCACC TTTGCGACCA TTGGTCCGGA AGTAGAGCTC ATGCTGATAC AGACGCTGAA 6360 

TGTTGGTGCC TAAATCCGCA GCCTGCACCA TCGCTTTTGC CGTGTCGGCG TTAAGGCTTA 64 20 

GTTGCGTATA CTGTGGAACA TACATGCCAC CAGTAACCGG AACCCCCGTG CCAGGACGAT 64 80 

ATTCCAGACA GTTGACCTCG TAGTGGTAAG TTGGGTCCTT ACACTCCTTT AATCCAGGAA 654 0 

ACTTCTCAAA GATTTTTGCC TTCGCAGCCT TCAGAGAATC CTCTGTTTTA TGATCGGCCT 6600 

CATCAATAAA GGCATAACGC GTTTCCTGTT TGCCATCTAC ATCTTCCAGC CAGCTGGCAA 6660 

CTTCCAGCTT CGGTTTGTCA TCAGGTTTGT TTTCTACCTG ATATTTCCAC TTAACTTCCC 6720 

CTGTCTTACT ATCGATGGTG TACGGCAGCG CAGCATCTAC GGCAGGATAA CGTTCATAGA 6780 

CCCAAATGCC CGTTGCGCGC TGCTGACGAA CGCGGTTCGG ATACCCTTGC GGATCC 68 3 6 
(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1332 base pairs 

(B) TYFE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
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GGAAAAACNC GCCGTATATT AGCCCGCGCG GAAAAAGCCC CGTNACGGGC AAACGCAGCA 60 

AGGTTTTATC CCAGCGCAGG CGCATGGCAG GATTTTTGAG TAGCCGTTGG GCCAGCACCA 12 0 

GAAGCCCCAG CAATCCCGCC AGCCAGTAAA CGCCGCTGGT CTGTAAGGTG TCGCTCATGG 180 

CGATGAGCGT GCGGGTGGAG GCGGGCAGCG CGTGTCCGAG ATGATCAAAC TGTTCGATGA 24 0 

TTTTTGGCAC CACTGCCGTC AGCAAAATAG TGACCACGCC CGTTGCCACC ACCAGCAGTA 300 

CCAGCGGGTA GAGCATGGCC TGCAGCAGGC GTGAATTTCC AGNACCTGCC GCTGTTACGG 360 

TGTAACCCGC CAGGCGATTG AGCACCACGT CGAGATGTCC GGATTTTTCT CCGGCAGCAA 4 2 0 

CCATCGAAGA AAACAGGGAA TCAAAGACGC GGGGATGTTC GCGCAGGCTG TCCGACAGGK 4 80 

TGTAACYTTC CTGAATCCGC TGCGCAGCGG CATTCGGAGG GTTTTTACAT GCAGTTTTTC 54 0 

ACTTTGCTCA CTGACCGCCT GTAAGCAGGT TTCCAGCGGC ATTGCTGCGT GTACCAGCGT 600 

TGGCAGTTGG CGCGTGAACA GCGCAAGATC TGCCGCCGCC ACGCGACGAT GTGCGTGCCG 660 

CCGACGCTGC AACATCCCCC CTGACGAAGT ATTCATCCGG GGTTCAATAT GCACGGGGAT 72 0 

AAGCTCTTTA CCGCGCAACA ACTGGCGGGC ATGACGCGCG GAATCCGCCT CAATCATACC 780 

TTTGGTTTTG CGACCATTAG GCTCCAGCGC CTGATAGTAA AACAGTGCCA TTACGCCTCC 84 0 

ATGGTTACCC GCAGAACTTC ATCGAGAGAG GTTTCTCCGG CGAGCACTTT CTCAATGCCG 900 

TTGCTGCGGA TACCCGCAGA GTGTTGTCGG ACATAACGTT CCAGCTCCAG CTCCGCGGCC 960 

TGACGGTGGA TCAAATCACG CAATGTGGCA TCCACCACGA TCAGCTCATG GATGGCAGTC 1020 

CGTGCGCGAA AACCTTTGTG ATTACAGGCG GGACAGCCCT GTGGATGGTA CAGAGTGACG 1080 

GTACGGGCGT GGGTAATTCG CAGCAGGCGT TTTTCTTCGT GGGTGGCAGG CGCGGCCTGA 114 0 

CGGCAGTCGG AGCACAGCGT GCGGACCAGT CGCTGCGCCA TCACGCCCGT CAGACTGGAA 1200 

GAGAGCAGGA AAGGCTCCAC GCCCATATCC TGCAAACGTG TGATCGCCCC CACCGCTGTG 12 60 

TTGGTATGCA GCGTGGAAAG TACCAGGTGT CCGGTCAGTG AAGGCTGAAC AGCGATTTCT 132 0 

GCGGTTTCGG TA 1332 



(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4407 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

CCCAACGTTT ATCGTATTTC ATTAAAGTCC CTTGCCCGAT GCTATCTCGA GTTACATGAC 60 

GAAATCGCTG ATTTGGATGT CATGATTGCG GCAATTGTCG ATGARCTGGC GCCTGAACTG 120 

ATTAAACGTA ATGCTATTGG ATACGAAAGC STTCGCAGTT GCTGATCACG GCAGGAGACA 180 
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ATCCCCAACG ATTAAGATCA GAATCAGGTT TTGCGGCACT GTGTGGTGTC AGCCCTGTTC 240 

CCGTATCTTC AGGAAAAACG AATCGTTATC GACTTAACCG GGGTGGAGAT CGTGCTGCAA 300 

ATAGTGCACT TCACATCATT GCCATCGGAC GTTTGCGAAG TGAGGATAAA ACGAAGGAAT 3 60 

ATGTCGCCAG ACGAGTAGCG GAAGGGCATA CAAAAATGGA AGCAATACGC TGCCTGAAGC 4 20 

GCTATATCTC ACGCGAAGTT TATACATTAC TGCGTAATCA AAACAGGCAG CTCAACAGCA 4 80 

TCCCGATAAC GGCTTGACTC TTAGAAGGGC GTCCAGGGCA GCCACTATAC AAGCAGGCAG 540 

TTCCGGCAGT TACTGTGGCG TTACCAGATC AAACAGAGTC TGAGTCGACG AGGAAATTGC 600 

TGGGATAACA GCCCGATGGA GCGCTTCTTC AGGAGTCTGA AAAACGAGTG GATACCGGTG 660 

ACGGGTTACA TGAACTTCAG CGATGCTGCC CATGAAATAA CGGACTATAT CGTTGGGTAT 7 20 

TACAACGGGC TCAGGCCGCA GGAATATAAC GGTGGGTTGC CACCAAATGA ATCGGAAAAC 7 80 

CGATACTGGA AAAACTCTAA AGCGGTGGCC AGTTTTTGTT GACCACTACA TTTAGTGCGA 840 

CACGGGAAGG GCGATATGAA CGATACGATA CATCAATGGT TTATTGCGGT GATAACCTGA 900 

AGGGTGAGAT TGAGGCTATT TATAATAGTC TTGAGAGGCG TCAGGTTTAG AGCAGGAATG 9 60 

CTGAGTAGCC ATCTTATCGA TTGTTTTCGA GCGTAAGATG GCTGAATGGA ATGGCTATTA 1020 

TTGCACAGTG CTTAATTATA ACATTCATAC CGACATGATT ATCTTCTGTC CGGAAGAATC 1080 

AGAGGCTGCG GTTTCAGACT GTCTGCCGGT ACATTCCTCT CTCCGTTAAA AACCATAACG 114 0 

GGTTCATTAT CTTCGTCTGT CAGCAGATTG AATGGCGGTA TATTTTCAGT ACGAATGCCG 1200 

GTCAGCCACT GAAAAATACC TGCGAAATGA CGGGCACTGA TTTTTCTGCT GACGGACTGA 12 60 

TGAGACGTGA TGTCACTGGC GGTAATAATC AGGGGAACGC TGTAGCCTCC CTGCACATGA 1320 

CCATCATGAT GAACAGGATT AGCACTGTCG CTGACCGACA GACCATGGTC AGAAAAGTAA 138 0 

AGCATGGCAA AATGACGGGA ATGCCGGCGA AGGATACCAT CAAGCTGCCC GAGAAAGTTA 14 40 

TCCCAGTTTA CTGATGCTGG CGAGGTAACA GGCAATTTTT CGGGGATACT GCCCCAGGTA 15 00 

ATGATTCGGC CAGGAGTTAA GCCGGTCACA CGGGTTCGGA TGAGACCCCA TCATGTGCAG 15 60 

GAATATCACT TCGGAGAGGA TTTATCCGCC AGTGCACGTT CTGTTTCCTG TAACAACAAC 1620 

ATGTCATCGG TTTTACGGGA AGCAAAGCTG CCTTTCTTGA GGAAAACGGT ATGCTCCGCA 1680 

TCAGAAGCAA TAACAGAGAT GCGTGTATCA TGCTCCCCCA GCTTTCCCTG ATTGGATATC 17 4 0 

CACCATGTGC TGTATCCTGC TTTTGCTGCC AGCGCCACCA CGTTGTTGCC GGAGTCAGGG 18 00 

TTCTGGTCAT AGTCATAAAT CAGTGTCCGG CTCAGGGAAG GTACGGTACT GGCTGCTGCC 18 60 

GAT G TAT AG C CGTCAATAAA TAAACCGGGA GCAGTATTCA GCCACGGTGT GGTTGGCACG 1920 

GGATAGCCAT ATACCGACAT ATAATCCCTG CGCACACTCT CACCAGTGAC GATAACAATC 198 0 

GTGTCATACA ACGGTACACC CGGCAGGATT TTCCAGTTGT CAGCCCCGTG CTGATTCAGT 204 0 

TGTTTATAAC GCTGCATTTC ACGCAATGTG TCAGTTGTCC CCACAACAGT TCCTTTAACC 2100 
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ATCCGCAACG 


GCCAGCTGTT 


TACTGAGCAT 


AATACGAACA 


G GAGCAGTGC 


GAGGGAGTTA 


2160 


CGGTGACCGC 


GGTGGTGTGT 


TCGC CAGAAA 


ATCACCATGA 


ATACCAGAAT 


CGCGGCACTG 


;:220 


ACCAGAAAAT 


GATAAACAGG 


AATCATCCCG 


GTAAACTCCG 


CTGCCTCATC 


AGTTGTGGTC 


2280 


TGCAGCAACG 


CAACAATAAA 


ACTGTTGTTG 


ATTTTACCGT 


ACGTCATACC 


GGCAGGCGCA 


2340 


TACAGTGCAC 


AACAGAACAG 


AAATAACAGC 


GGTGTAATGG 


ATGTGAGGGT 


ATTTGTGTGT 


2 4 00 


GCAAGAAGGA 


GAAGAAAGAA 


CAGCAGCAAC 


ACATTCCCGG 


TGGTATTCTT 


GTCAGTGTAT 


2460 


CCGCATGCAA 


TTGTGGTTAT 


GACAGAAAGA 


AGAAAAAAGA 


ATAAAAACAA 


TATAATCCTG 


2520 


AGAGTGTTGC 


CCGGACAAAA 


CAGTTTTCTG 


ATATTCATCG 


GAGTATATGG 


ACAACATTAT 


2580 


TATGAAGAGA 


ACAGGATAAT 


AAAAATCAGA 


AGTTATCTGT 


GAAACAGATA 


ACAGACANCC 


2640 


CTGCAGTATA 


ATATTACTGC 


AGGGTGTTCC 


TTTTTAATTA 


CAGAAATACG 


TAATTATCTT 


2700 


AATTGCAGAA 


ATATGCGCAA 


TTATCGTTCA 


GAAGCAGTGT 


CGTGAGAAGT 


TATAAGTCAC 


2 7 60 


ACCAAGCAGG 


ATGTCATGAC 


TTTTAACATG 


AACCTCTGAT 


TTATATTTAT 


CCCCTTCTGT 


2820 


ATCCTTGTAA 


TACAGGGAGG 


ATTTACCAGC 


ATCGAGATAG 


CGATAGCTGA 


GGTCAAGAGC 


2880 


GATATCCGGG 


GTTACGTCAT 


AGCGAACACC 


GGCGCCAATG 


CTGCATGCGA 


AGTTGTCAGC 


2940 


AGAGCCTGAG 


CGTGATATAG 


AATAAGGCAC 


TCGCTCACCG 


TAGCCATAAT 


CCCAACTACC 


3000 


GCTACCTGTT 


GATTCCTGAT 


GAATTCTGGC 


GTAACCAATT 


CCGGGAGACA 


GGGATGGCGT 


3060 


AAATGCACTG 


TCGTTTCTGA 


AATCATAGTA 


CGCATTCAGC 


ATCAGGCTGT 


TGACTGACAC 


3120 


CTCATTCTTC 


AGGTCACTAT 


GTCCCGCGTG 


GTCCTTATAG 


AGGTTGTATG 


TTGTGTCAGC 


3180 


TTTTCCACGG 


GCGTAAAACT 


CCAGTTCTGT 


ACGGACAGGA 


ATACTGAACT 


GCGGATGCAA 


3240 


GTCATAACCA 


AACGCTATAC 


CTCCACTGAA 


TACCGTGTTA 


TGGCCATCCC 


CCCCCTATAC 


3300 


TTTGATGTTT 


CCTCTTTATT 


TTCGGACAGG 


AAACTGTGGT 


CAGAAAGAGA 


TACTGCTGAA 


3360 


GTACCTGCTT 


TACCGGTCAG 


ATAAAAACCG 


CTTTTACCTT 


CGTCAGCACC 


CGCATTTGCT 


3420 


GCAANCATAC 


AGGCAGCGGT 


AACTGGTGAA 


ACAGCAAAAA 


CTTTTTTCAT 


TTCAATTAAC 


3480 


TCCATTATTT 


CACTATTTTT 


GTAAATAGCA 


CTCCTAATAT 


TTTAAAACCA 


GTCAAAAGAT 


3540 


AGTATCAAGC 


AAATTATTCA 


TGTCTAATGA 


ACAGATAAAA 


TCGACTATGT 


GTCGGCAAGA 


3600 


CTCTGCTCGA 


GCGATATTCC 


TCTTATTTCC 


GCCTCGATGA 


AATACGCCCG 


TTAGCTTATT 


3660 


TGTACCCCTT 


AT AATGGGAT 


GTTGGCCAGC 


CAGACCCGGC 


ATGATTAGTT 


CTCCCTGTCG 


3720 


ACTATGCTCC 


GGGAGGGATG 


TCACCGGGTC 


TGGTGAGGCG 


CGGATAACCG 


CTAATAGGGG 


3780 


AAGGTCAGGT 


ATTTTACACC 


GGGACCGTCA 


GGGCAAGATA 


ACGAAAGCCA 


GCTCCCCGGA 


3840 


TGAACTGACG 


CCAGATAGTT 


TGTGTGCATT 


GCTGCTTTTC 


TCATCTTACG 


TCTTAACCCT 


3900 


GCCTTGAATA 


CCTTATCTCT 


CGTCAAAATA 


TTAATAGCGA 


TATGCCGTAT 


CCCTGAAAAT 


3960 


AATCCCGCTG 


CGTTTCCTCT 


TCTTAGTTGC 


AGTCGTCTTG 


ATTCATTACC 


AGGTCCAGAC 


4020 
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GCCATGCAGC TTATTCTCCA CGTGCCAGTG ATTTCGGATC G C 



TGTGACGA ACTTGTCTGC 



4080 



GGTTAAATCA GCAGAACTGA TATAATATCT GAGGATTATT TCTGACTCTT GGTTTTGTTC 



414 0 



TGCTATTATT GAGGGAAAGG AGACTGCCAG GCATATTTTT TCAGCCCTTT GCATTCAAAC 



4200 



GTGAATTGAA TCAGGTCATC AGGGACNTCG CCAAAGCATA TGAAGACGGG ATCCTNCTCT 



4260 



GCGGTGAGTC TTGTGACTAA TTGCGTAACA GTGATGGTCN GGGATAATTA AATCTTTCAG 



4320 



GGGAAATAAA AAGATTATCA GATATGGGGA TGACACCACA GGAGGGGTGA GGCGAGTATG 



4380 



GATAAACGAT GTACCTTATT AACCAAA 



4407 



(2) INFORMATION FOR SEQ ID NO: 76: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 824 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 

TTTTTTGCAA GAGAATTTCC CTGAACCTGA AGCTCATCAT CGCCATCTCC GCCGTTCAGG 60 

TAATTATTAC CTGCTCCCCC AATTAACTTA TCGTTGCCAT CACGGCCATA GAGCTGGTCA 120 

TCTCCGTTTC CACCACTCAG TGTGTCATTA CCTTTATCAC CATATAAGCG GTCATTCCCG 180 

TCATTTCCTT CTATATGGTC ATCACCATCC GCGCCATGGA AGATATCAGC AAATTTACTG 24 0 

CCAAAAAACT TGTCGGCACG CGTGGTCCCA ATAAGTTCTT CCACGGAATA TAAGTTATCA 300 

GTCTCTGTTA AATTTTTACC ATTGATATGA GTGAATTCAT AACTCCGATA TTGCGTTTTT 3 60 

TCAGTTCTTT TTCCAACTGA AACCTCCTGC TCCTTCACAA CTTCCTGTAA AACCTTAACA 4 20 

TCACCACCAA GTACACGTGT TACCGTGTAA TTACCCGCTT CGGTTGCTTT TGTGCCATCA 4 80 

ATGGTCAGAT AACCGGTGTC TGTTTTATCA TAATAAACAA CATGATGTCC TTTACCTGCG 54 0 

TAGATATTGG CTGAGCCGGC AGATAAAAAG ACCTTATCAT CCCCGTCTCC CAGGTGTGAC 600 

TCAATACGAA TTTGCCGATA CTGGTTATTA CCGACTGATG CATGCTGAAT CAGGTTAGAG 660 

TAATCATATA CAGACCCCTT GTCCTGNAAC CCCCTTCACC GTCCATTTAT CAACACCCTT 7 20 

GACTAATAAC TCGGTAATAT ATTCATATTT TCCGGACTGC CTCCTTTCAC GAATTTCCTC 780 

ACCGGGAGTT TAACAATGGG CGTAACNAAT TTGCAATAAC GTGG 824 
(2) INFORMATION FOR SEQ I D NO : 77: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 550 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRI PTION : SEQ ID NO: 77: 

GNGGCCGCAG TACTGGATCA TCACCGAAGT TTCGCGCGGA AAAGCGTTAG AGAAAGATCT 60 

AATGCTTCAT GATGGTGATG GACTTTTCCT GATGGTGAAA TCCAGCGGGA AATGCTCTGG 120 

CGTTTCCGTT ATCAACATTC GACAACAAAG CAGCGGACAA TGATGGGACT CGGTGTCTTT 18 0 

TCCACACTTT CACTTGCTGA TACCCGAGGG CTAAGAGTGG ATTATATTTC CTTATTAGCC 24 0 

AACAGAATCG ACCCGCAAAT TCAAGCTAAA GCCGTAGACG AAGAGCAATA TTTGAAAAGG 300 

TGGGCACCTA CGTTACCAAT ACTGGCTTAA TGGCTACATA CGGCGGTCAG GGTCAGTTTA 3 60 

CGCTTACAAA ATATAAAACA ATTTGATACA AAATATTCCT CTTATTCTAA ATAAAAGTAT 42 0 

CTTGAAAACC TTCCAACTGG AAGGTAGATT GAATTTATGC TAAACATAAA GAGGAATTGC 4 80 

TTATGAATTA CGTTATCCGC ACTACCACCG TCGTCTTTAG TCTCATGCTG GGCAGGTTAC 54 0 

GCAACTGCTG 550 



(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 382 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 

CACTAAAGGC CCTGGATGTT TTTCGCTCAT TAGTAGACAT CTCGCTGATA ACGGCGCTCT 60 

ACGCGCACTC ACTTAAAAAT TCATCCGCCG CTTCGGTGTC CATGCCACCA AATTCGGCAA 120 

TCACTTCCAG AAGTGCCTGC TCAACGTCTT TCGCCATGCG ATTAGCGTCG CCGCAGACAT 180 

AAATGTGGGC ACCATCATTG ATCCAGCGCC ACAGCTCCGC GCCCTGTTCG CGCAGTTTGT 24 0 

CTTGTACGTA AACTTTTTCT TTTTGATCGC GCGACCAGGC AAGATCGATA CGTGTCAGCA 300 

CGCCATCTTT GACGTAGCGC TGCCAMTCCA MCTGGTACAG GAAGTCTTCC GTAAAGTGCG 3 60 

GATTACCAAA GAACAGCCAG TT 38 2 



(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 
TAAATCAGCA GAACTGATAT AATATCTGAC CATTATTTCT GACTCTTGCT TTTGTTCTGC 60 
TAT T ATT G AC CGAAAGGAGA CTGCCAGGCA TATTTTTTCA GCCCTTTCCA TTCAAACGTG 12 0 
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AATTCAATCA GCTCATCAGG AA CATC G C AA ACAATATGAA GACGGATTTC TTCTCTGCCG 180 

TGACTCTTGT CACTAATTGC GTAACAGTCA TGCTCTGGAT TATTTAATTC TTTCAGCGAA 24 0 

AATAAAAGAT TATCAGATAT GGGATGACAC ACAGCACCGC TGAGCAAGTA TGTATAACCA 300 

TGTACTTATA ACAAAAGGAG ACGTAAGAAG GGGAACGGGT ATCAGAGGGC CAATCAAAGC 360 

AGGTATAATG AACGCCAGTA TAATTGTCCG CAACCCAGAA ATATATTATT GAACTGGTTA 4 20 

TCTCCTGCGA ATGCATATAC TGCAACGGCC GTTAAAATAG CATTATATCC ATAAAGCCCG 4 80 

GCAGAGATTT TATCAGGAGA AAGCTCAGGA ATACAGAATG ATACCACCAC ACTCAGAAAC 54 0 

GAAGCGACAA CCGTAATCAT CAGTAGTTTC CGGCTCCCTG CAAGTAGTCC CAGCATAACA 600 

AGAATACCGC CGACAGCATC AGGAAACATA AAAATCTCCA TAAAGCTACC AGACAATGCC 660 

ACCGGATAGT TTTTCAGCAA AACAGAACCT GCACTTCGCC CGAAGGTACT GACATATCAT 720 

GAGGCATTAT TCCGGAATGT AATAACCACG TAGCGATAAT AAAGGGGGCG GTCAATACGG 78 0 

GTAACCCTCT GAGCACTGAC GACAACAGGG GAGTAAACAA AACAATACCA AGAGTTCCGA 84 0 

CGATAAGTAC AGCAATTCCG GAGACTGACA CAGGGACAAG CATGCCACAG GCTATGCCAT 900 

ACAGAACAGC ATTATATCCC CATATACCTT CATTAATCTC CTCATCAGGA TACCGCAAAC 960 

ACCAGGCAAA GAACGGAGAA AGTGCTGCAC TGATGGCTGA GAAATACAGT ATTTCGGGGT 1020 

GCCCCATATT AAAAGAGGCT ATTCCAGTCG CCAAAAAAAA GAACAAGCCA GAAACAACAT 1080 

TGTTCTGTAA TAATACCTGT GAATACCCCT TACTAAAGGC GGTTATCACC TGTTTTACTC 114 0 

TCATGTAAAA TGTCACACAC ACCTCATACA TAAACCATTC TCCGCTTCTG CGGGACAGTA 1200 

CCGCCCCTGA CTCCACCTCA CAGCGGATTG TGTATTTTTA AACAATCACA GTCTTCTCAT 12 60 

ATACTTTCCA TTCTGAAGCT TATCTCTTCC TCCGTGATAA GCTTCCGTCG CGGGATGTGT 1320 

TATACGCCCT GTAAGACAGT TATAAAGGAC ATCAATGCCA TAGTTAATGA YTACCGAATT 138 0 

CCGGTGGATA GTCAGTACTG GTTTGCCACA AAACAGTGCA GTCACACATG ACAGGAGAAG 14 4 0 

ATATGAGCCG GATACCGCTG CTCTGAGACT TAACGCTCAT GTAAACTTTC TGTTACAGAT 1500 

TCTTCCAGGG ACTAAGAAGA TAACTGANTT ACGTTCGCAT TCCAGTSTTT ATTTCTGCAG 1560 

TGACAGCCAT ACCCGAGCTT AATGGAATGT GCTTATTCCC GGTTGACAAA TCATTCTCTT 162 0 

CAACAGAAAC AATGACATTA AAAACGAGTC CCAGTTTCTG GTCTTCTATT GCATCTAAAT 168 0 

TTATATTTTT TACCTTACCC ACCAGATAAC CATATCGGGT GTAAGGAAAA GCCTCCACTT 174 0 

TAATGATGGC ATTCTGCCCG ACGTTAATAA AACCAATATC TTTATTTTGT ACCAGAGCAG 18 00 

TAACCTCCAG CGTGTCATCT TCCGGAACGA TGACCATCAG TGTTTCCGCT GTTGTAACAA 18 60 

CCCCACCTTC AGTATGAACC TTCAGTTGCT GAACTTTTCC CGAAACAGGG GCCCTGATTA 1920 

'"GAAGCCTG TTGACGCTCT TCATTTTTCT CTAACTCCAG AGTTAATAAC TCAATGCTGT 1980 

v GTTGTTTG TCTTAGCTTG TCTAAAATTT CATTTTTAAA AAGCTGCGTG ACAAGCTGAT 204 0 



3NSDOCID <WO 9822575A2 I > 



WO 98/22575 PCT/US97/21347 

-193- 



ATTCTTCTTT 


TGCAGACAAT 


ATCTCACTCT 


CAATTTGCTC 


CAGTTGCGAT 


TTATAAACCC 


2100 


GTAATTCATT 


TGCTGCCTCA 


ACATATTTAT 


TCTCCTGCTC 


AAGTACAGCA 


TGTTTTGCAA 


2160 


TTGCCTGTTT 


ATGCAACAGG 


CTCCTGAAAT 


CATCCAGACG 


GCTTTTTTCA 


ACCCTCGATA 


2220 


CATTTTCATA 


ACGGTTTATA 


CGGGCAAGTA 


TTGTTAAWCG 


CTCTGCTCTT 


TTCTTATCCA 


2280 


GATTCAGTTC 


TTTTTGATAC 


TTCTGATTTT 


GCCATGTGGA 


AAACTGTTCT 


TTTATCAAAG 


2340 


AAGTTAAACG 


CAGTACTTCC 


TCTTCAGATA 


CATTCTGAAA 


ATAAGGCTCA 


TCAGGAAGTT 


24 00 


TCAGTTCAGG 


AAGTTTATTT 


AATTCAATTG 


ACCGGCTCAG 


AATTTGATAC 


CGAATTTGTT 


2460 


CCAGCCTGGC 


CTGTAACAGT 


GATGACTGCG 


TTTTTAACGT 


ATCAGCTTCA 


GCTCCCAGCG 


2520 


CTGTAAGCTT 


TAATAACACA 


TCCCCTTTCC 


GGACTGACTC 


TCCTTCTTTT 


ACGAYAATTT 


2580 


CTTTAACTAT 


CGAGTTTTCA 


ATAGGTTTAA 


TTTCTTTNTA 


CGCCCACTGA 


GTGTTAATTT 


2640 


CCCATTTGC A 


GTG GC AACAA 


TTTCCACCTG 


GCCTAAAACA 


GATAAAATGA 


AAGCAATAAC 


2700 


CAGAAACCCC 


ATAATAAAAT 


AAGCAACCAG 


ACGCGGCCGT 


CTGGATACCG 


GCGTTTCAAT 


2760 


TAATTCCAGA 


TGAGCGGGTA 


AGAATTCATT 


TTCGTCCTTT 


TCACGTACCG 


GAGTATCTAA 


2820 


CTGCTTCCGG 


ATTTTCCATG 


TTTCACTCCA 


GACAAGTTTA 


TAGCGCAACA 


GGAACTCGCT 


2880 


GAACCCCATT 


AACCATGTTT 


TCATATTCTT 


CTGTTCTTTC 


TGTTAGTCTG 


ACTGTAACTG 


2940 


ATATAAGTAA 


CTGTATAAAC 


TTTCCGGTTC 


AGAAAGCAGC 


TCCTTATGTT 


TACCCTGTTC 


3000 


AACAATTTTC 


CCTTTTTCCA 


TGACAATAAT 


GCGGTCTGCA 


TTTTTTACTG 


TAGACAGACG 


3060 


ATGAGCAATG 


ATTATAACCG 


TTCTGCCCTT 


ACATATTTTG 


TGCATATTGC 


GCATGATGAC 


3120 


ATGCTCCGAC 


TCATAATCCA 


GAGCACTGGT 


TGCTTCATCA 


AAGATGAGTA 


TTTTAGGGTT 


3180 


GTTCACCAGC 


GCCCTTGCAA 


TTGCGATGCG 


TTGACGTTGA 


CCTCCGGATA 


ATCCTGCCCC 


3240 






TATACCCCTr 


ACGCAATTCA 


GAAATAAAAT 


CATGAGCACC 


3300 


TGSTAATTTC 


GCTGCATAAA 


TAACTTTTTC 


GACGGACATG 


CCAGGATTAG 


CCAGTGAAAT 


3360 


ATTATCAATA 


ATACTGCGAT 


TAAGCAGCAC 


ATTGTCCTGC 


AACACAACCC 


CCACCTGACG 


3420 


ACGTAACCAG 


TTAGGATCGG 


CCAACGCAAG 


ATCATGTCCA 


TCAATTAAGA 


CCTGGCCATT 


3480 


TTCAGGAATA 


TAAAAACGTT 


GAATTAATTT 


AGTTAATGTG 


CTTTTTCCTG 


AACCAGAACG 


3540 


TCCGACAATA 


CCAATAACCT 


CCCCCTGCTT 


AATACT 






3576 



(2) INFORMATION FOR SEQ ID NO: 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3541 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 



vISDOCID <WO_ _ 982257SA2 I > 
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- 1 94- 

TCAGCCCGGT GAGCGGGTTT GAGAATTGCG CA GTCACCAT TGGGCTAAGG GTTATCAGGT 60 

GGGGTTAAGG AAATGGCAAA ACCTACCCCC GTGCAAACTC CAGT 2GCTGC ACATTCACCA 120 

TCCCTGGGTT CTCACCTGCG CTGACATCAA TT7GTGTCAC CCGCAGCGCA TATTTTTGAT 180 

CCAGTGGTTT TAACGAGTTC AGCAGGTCAT TAAACACCAC AGGTTGTATC CAGACGTGGA 2-10 

TATTCTCCCC GCGCTCGGGA ATGGGTTTGA TGACCACCGA GTGGGCGGAA GCTGTCACTG 300 

ATGACCCGCG ATACCTGTGC TGGCGTTGTC GTGCCGGATT TTCGCGCCGC AATAATATCC 3 60 

GGCGCGGCGC TCTTCAGTCG CGCGTTCATC GCCACCAGCT GGTGCAACAT CGTCTCCTGT 4 20 

TGCTCAATCG GTTCGGTCAA CGGCTGCCAG ATGAGAACGT AATATGCGGC GCTAAACAGG 4 80 

AACACTACCG CTGCCAGTAA CATGCCTTTT TCACGGGGCG AACGGGCCGC CAGGTGTTGT 54 0 

GTCAGCCAGT GTTCGCCACG GCTTAAGTGG CGTTGACGCG ATTGCTGAAA ATAGTGAATA 600 

AATTTATCGG GTAACATGTT ATTTCCTCCG CAAGGTTACG CCGCCGGAAA CCGCATCACC 660 

CTCTTTCTGT AAGGCGTCGT GTTGCACAAG ATAATGTGCG GCCAGTGCGC TACGAGTTTA 720 

TCGAAGCTGG CAAAGTTCGC AGCGGGTAGC TGGAGGTGAA GCGTGTGGCG TTTTTGATCA 780 

AAGGTGAAAC ACGCATTTCG ATGTGGGTAA GTGACGCTGA TTTCAGGGTA CTGGCGATCG 84 0 

CTGACAATTG TGCGAGCAGC CGGGTATCGT CGGTGTGTGG GCGATATTTT TTCAGCGCCA 900 

TCGTCACCTG AGAGCGTAAA TTCACAATCC GCTTCTGCTC CGGGAATAGC GTTAAGAACT 960 

GTTTCTCCGG CTGGGTGCGG CTTTGCGCCA CGTGTTCGCT GAGGCTCCAT AACGTCACGC 1020 

CCCGTTCCAC TAGCAGCGGA ACCAGAATGA ACAATATCGG CAGAATCATC ACCCGCCAGC 1080 

GCGCCCACTG TTTTCGGTAG CTGACACGAG GCTGCCACGG GCGTGTTAGC AGGTTCCCTT 1140 

CCGGTTCGCC ATAAGTGGTA ATGGCGGGCA GAGCGTAACG GTGAGGGTTG GGCGTCTGCA 1200 

CCAGCCCATG GAGAGAGTTC TTCCGGTGGA ATGCCGACCA CGGTTAGTGA AAGCGGTAAA 1260 

TCCTGCTCAT TGAGCTGTGC TCGGAACATG ACCGGAGCGA GCGCCCGCCC GGCGCTCCAT 1320 

CCCCGGCATT CATCGATGCG GMAGATAACC CGTTGCGCAT CGGGAGCCAT AAACCCACAA 1380 

GGAATGGACA TCGAGTCCGG CGCGACGATA GCGCGGGTGA TGGGGTTTGC CTGCAACCAC 14 4 0 

TGCGCAATGT TGCGCATATG CTGGTGGTGA ATGAGAGGTA CGGTTGCCAG TTGCTGGTCG 1500 

ATTTTCAACG GGGCGAAATG CAGTTCATGG ATATCCTGGT TCAGCTGTTG TTCCAGCAAG 1560 

GCGGGCAGAA TCGTCGGTAT CTGGTTGGGG GGCACATCAG GGAGTTCAAG CTGGCAGACG 162 0 

CTGATCCATT CGCCGGGAAT GTAGAGTCGA ATCGCATCAG TTTGCAGCCA TTGCTGGAGA 1680 

CATTCATCAG GAACGTCAGG CCAGATGCCG CACTCCACGT CGGCGGTACG ACGCTGCCAA 17 4 0 

CGGATGGGAG CGGAAMGN CA AAGCGGGAAA AAAATCTCAA GCATGGAAGT CACTCACTTT 18 00 

CTCCTGTCTG ATGCGAGAGA AG AG AAAAG T GTTGTGGGCC CATGGGGAGA ATTAACGAAT 18 60 

TCATCGTCAG TTCAATCTCA TTCACGGTGA TATCTGAACG CAGCTAGAA3 TAATTGGTGT 1920 
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CCACGCTCAG GACGGTTTTT AGCTGTTTTT TAGTACGCTG ATCGACGTCA GCAAGTAACG 198 0 

GCTGTGCAAG AAACTGATCG ACATCTTOCC AGCCCTTOGC ATGACGTTOT TGTAATAACG 204 0 

CTCGCGCCTG AACAGGGCTT AACCACGGCT CAAACAGCGC CTCAAGAATC ACACTTTGCG 2100 

TGACGTCTAA GGTATTGATG TTGATTTGCT GGCGGGTCAT CGGCAGCGCA CAGACCAGCG 2160 

GTTTCAGTTT TTGATAAAGC CCGGCGTCCA TTCCCTGCAC CACGCGCATC TCGCTGATAT 2220 

CAGCCAGCGG TTGATTAGCG GCGTAAAACG GCACCGAACG GGCGAGATAC TCGCTGTCTT 2280 

CACGGCCCAG ACGCGTCTGC ACGCTGCGGT CTTCGTCAAT AAACTCCCAC AGGCTTTCGG 2 34 0 

CTATCAGTTG GGCCCGATAA GCAGGGACAT CCAGGCGCGT GATCAGGGCA ATCAGTTGTT 2 4 00 

GTACCGCGAG CGGACGCGAC GCCGTCGTCG GCTGAGCGAG GGCATTCAGG TTAAAGCAAG 24 60 

CCTGTGCGTC ACGCAGAGTG ACGGCGATTT GCCCTGCGGC AGTGGGAAAA AACGCGGGCC 2520 

GGAAGCCCNA CGTGCGCCAG ATGCACGCGC TTTTCATTTT TGAGGCTCAG ACTGAGTGCG 258 0 

CTCAACGCCA GGCTTTCCGC ACTGGGGCTG TACCACAGCG CCTGCTGGTA CTCCTGCTGG 2640 

TGCGCGTTCG CCCAAGTTGT TTCTGGATCC GCCCGGAAAG CGTGATGGTC AGCAGCATCA 27 00 

TAACCGCCAG CAATACCAGC ACCAGGACCA GTGCCATTCC GGGTTTTGGT GGTGAGGTGA 2760 

TCATGATAAT TGCGGCCCGC GTAACAACCA GATGCGTTCA ATTTCGCCCC ATTGTGGCGA 2820 

ATGCAGGGTT ATGCGTACTG CCACGGGGAT CGCCTGCACT GATGACCAGC TCTCCTGCCA 2880 

GCGCGTGCCG TCGTAGAACT GCAAAGGGAG CGAATGCGGG GGGATTAATT TTTGCGTTGT 2940 

TGGCTTCAGG CTGCCTGCCG CATGGGTCAG TGGCCAGGCT AACGGTTCGA GATAACCACC 3000 

ATGAATGCGG TAACCGACGG TGAGCAGATT ACTGCGCGGC AGAGGGATGA ACGGATTAAC 3060 

CACGCCGCCA CGTACAAAAC GCATCCCTTC ACTCTCAGAC GGCAGCACGC CAGCGCCCGG 312 0 

CAGTAACGCT RGTTCACGCT GGCCCTGATC GCCTCTTACC GGACGCGGCA TGATTTGTGT 3180 

CAGATCGTG j GTCAGAAAAC TCATCGTTTG CTGCATGAGG TTTAGTTTTT GATCGTGTCC 32 4 0 

GGCGACGGCG GTATTCACGC GTGTAACCCG TTTGTCACCT GCTGCGCCAT CATTGCCAGT 3300 

GAGGCAAAAA TGGCTATTGC CACCAGCATT TCCAGTAACG TGAAACCAGC GCGAGTCCTT 3360 

CTCACTGTTG GTCTCCCACG GCGCTAAACC ANGCGCGTCG TGACTGAATC ACTGACGAAA 34 20 

AGTCNTCATG AAGACTGACT TCAATATCCA CNGGATGGAG CAGCGCATTA NCGGTATTCA 34 80 

GTGGTGTTGG TTCGCCAGAA CCAAGCGGGT TTCCTGCCAT AATGGCTCTC GGCCCTGGGT 354 0 

G 354 1 

(2) INFOFMATION FOP SEQ ID NO: 81: 

(i) SEQUENCE CHARACTEPISTICS : 

(A) LENGTH: 1234 base pairs 
(£) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 

GTACTGGACA TCTTTGATGA ACAAGCTCCT CAGTGTAAAT TGTACGTCTC TGATCGTAAT 60 

CTTCCTGAGG GCGTTGAACA TCTATCCGCT GAATTTATAC CCTATACTCC TGAGTCGGCA 120 

GATTTTCTGA TTCAACGTTT TTTCTCTGAA ACTATCCATA TTGAAAGTGC AATTGTTGTT 180 

ACAGCACTTA AAATTGCCAA TCAGATTGCT CTATCTCAAA ATGAGACCAA GAATGTGTAT 24 0 

CTGCTTGGAT TTGATTTTAC GATAAAG GGG GGGTTCACTA GCAAGATCCC CTGCGCAGCC 300 

TTGCATGCOG AACCAGAATA TCAAGAGCGA ATTATCAGTA GTCAAGAACA GCTATTGCAG 360 

ATGCTCCTTG CAGAAAAAAC ACGCCTGAAT ATCAATATCA ATCATGTTGG TAATAAGCCT 4 20 

TACAGCGTAT ATTCTGTTGA TGCATTTAAT CAAGTGTTCG CTGCCCGCCA TCGTGGAGTC 4 80 

GTGCTGCCCA CACATGCCCA GATTTCCACT ACATCATCAC AAAATGGGGT GAAGGTGATC 54 0 

GCAGAGATTA CTACTAATCA CTTTGGTGAT ATGGACCGAT TGAAGTCAAT GATTGTAGCG 600 

GCCAAGCAGG CAGGGGCTGA CTATATCAAA CTGCAGAAGC GTGATGTTGA AAGTTTCTAT 660 

AGCAGGGAGA AGCTGGAGTC ACCGTACAAC TCTCCTTTTG GCACCACCTT TAGGGACTAT 720 

CGGCATGGCA TTGAACTCAA TGAAGAGCAA TTTTCCTTTG TCGACTCTTT CTGTAAAGAG 780 

ATTGGTATCG GCTGGTTTGC TTCTATTTTA GATATGCCCT CGTATGAGTT CATTCGGCAA 84 0 

TTTGAACCAG ATATGATCAA GCTACCATCA ACTATATCTG AACATAAAGA TTATTTGGCT 900 

GCTGTTGCTT CTGATTTTAC TAAAGATGTA GTAATTTCAA CTGGTTATAC TGATGAGGCC 9 60 

TATGAGCGTT TTAYCCTKGA TAACTTTACC AAGGTTAGAA ATATTTATCT GCTGCAATGC 1020 

ACCTCGGCTT ATCCCACACC GAATGAAGAT ACCCAGCTAG GTGTGATAAG ACATTATTAT 1080 

AATTTGGCGA AAAAGGATCC ACGTATTATT CCTGGTTTTT CCAGCCATGA TATTGGTAGC 1140 

CTTTGTTCCA TGATGNTGTC GCAGCCGGTG CAAAAATGAT TGAAAAGCAT GTTAAATTTG 1200 

GCAATGTGGC TTGGTCTCAC TTTGATGAAG TTGC 1234 



(2) INFORMATION FOR SEQ ID NO: 82: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6313 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xr) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 

ATGGGACCTT TCTTCAATGA TGTTGCCGAG TGGTTAGAGT CATTAGGTCG TAACGCTGTG 60 

AATGTTGTAT TCAATGGAGG AGATCGTTTT TACTGCCGTC ATCGACACTA TCTGGCTTAT 120 

TACCAAACGC CGAAAGAATT TCCTGGTTGG TTACGAGATA TCCACCGGCA ATTTGACTTT 180 
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GATACCATTC 


TCTGTTTTGG 


TGACTGCCGT 


CCATTGCACA 


AAGAAGCAAA 


ACGTTGGGCG 


240 


AAGTCTAAAG 


GGATCCGCTT 


TCTGGCATTT 


GAAGAAGGAT 


ATTTACGTCC 


GCAATTTATT 


300 


ACTGTTGAAG 


AGGACGGTGT 


AAACGCGTAT 


TCATCGCTGC 


CGCGCGATCC 


TGACTTTTAT 


360 


CGTAAATTAC 


CAGATATGCC 


TGCACCACAT 


GTTGAGAACT 


TAAAACCCTC 


GACGATGAAA 


420 


CGTATTGGTC 


ATGCAATGTG 


GTATTA^CTG 


ATGGGATG-3C 


ATTACCGACA 


TGAATTCACT 


480 


CGCTACCGTC 


ATCACAAATC 


ATTTTCTCCT 


TGGTATGAGG 


CTCGTTGCTG 


GGGGCGTGCG 


540 


TACTGGCGTA 


ACTATTTTAC 


AAAATAATGC 


AACGTAATGT 


ATTGGCTCGG 


TTAGTGAATG 


600 


ATCTGGACCA 


ACGTTACTAT 


CTTGTTATTT 


TACAAGTTTA 


TAATGATAGC 


CAAATTCGTA 


660 


ATCACAGTAA 


TTATAATGAT 


GTGCGTGATT 


ATATTAACGA 


AGTTGTATAT 


TCATTTTCGC 


720 


ATAAGGCACC 


G AAAG AG AG T 


TATTTGGTGA 


TCAAACACCA 


TCCGATGGAT 


CGCGGTCACA 


780 


GACTCTATCG 


ACCATTAATT 


AAGCGGTTGA 


GTAAGGAATA 


TGGCTTAGGC 


GAGCGAGTCA 


840 


TATACGTACA 


CGATCTCCCA 


ATGCCGGAAT 


TATTACGCCA 


TGCAAAAGCG 


GTTGTGACAA 


900 


TTAACAGTAC 


AGTGGGGATC 


TCTGCACTGA 


TTCATAACAA 


ACCACTCAAA 


GTGATGGGTA 


960 


ATGCTCTGTA 


CGACATCAAG 


GGGTTGACGT 


ATCAAGGGCA 


TTTGCACCAA 


TTCTGGCAGG 


1020 


CCGATTTTAA 


ACCAGATATG 


AAACTGTTTA 


AGAAGTTTCG 


TGAATATTTA 


TTGATGAAGA 


1080 


CGCAAATTAA 


TGCTGTTTAT 


TATGGTGTAA 


AATCAAAAAG 


CAATAGAAGG 


TCCGCATTCC 


1140 


TAAACGGTAG 


CAGATGATGG 


TTTTCATGGG 


CGTTTCAGGT 


TACTCAATCA 


GCCAACAACC 


1200 


GCAGCGAAAA 


CCCTGCTTTC 


TCGACCAGTT 


CAGGCCGGTT 


TTACCTCCAA 


TGCTTTCCGT 


1260 


CAGAACTGAG 


ATTTCAGCCA 


GTTGCCGGAT 


AAGTGTGTCG 


ATTTGCAGCA 


GTATACTTTT 


1320 


TCGTACAGCC 


AGAATGTGGC 


AGACTGAGGT 


GGAATAGATA 


ACGTCCGTAT 


GCCCGCTCAC 


1380 


CACCTCCGGG 


CGGGAGTGTG 


TGGTATCTGA 


CATCATCATT 


TTTCCTTTCT 


GTTTATAAAT 


1440 


GAAAACGCCA 


GCCGTGTTCA 


GGCTGACGTC 


AGGGAAGTGA 


AATCGGGTGA 


GTGATCTTCA 


1500 


CTGGTTCTGG 


TGCAAAAGTT 


ACTGTTGGCG 


CAGGGTACGG 


ATACCCTCCC 


TGGCCTGTTC 


1560 


GATACAGGGC 


AACAGTGCTG 


CCGAATCTGT 


TTTATCCTCA 


TCGTTGTCGA 


AGATAATTCC 


1620 


CGATTCGCAG 


TCGATATTGT 


CCTGCAGCCA 


CGTAATCAGA 


ATATCCAGGG 


CTGTTTCCGT 


1680 


GGTTAATGAT 


TTCATGTTGT 


GAATTTCCGG 


ATTACCAGTC 


GAAAGTGGGT 


AAACCTGGCA 


1740 




ArTGGCATCC 


AGATGAATGA 


GACTGACACC 


ATAACGCCGG 


ATGAGTGTGA 


1800 


CGACCAGACG 


ACGGAACGTA 


ACAGATAACC 


GGTACCGGTA 


AAATGAATCC 


ATTCTGATTC 


1860 


ACCAAAGTCA 


CTGGTCTGGT 


GTAACAGCGA 


GTACAGCCAG 


GCGTTGTCCT 


TTTCCGTGAT 


1920 


ATGTGCGGTA 


CTGCAGCGTA 


T3CCGGAAAG 


AGTCGTAAAC 


GGTTGTGGAG 


TGCAGGTTGA 


1980 


CTGTTGGTCA 


GATTCATCCA 


CCACGCGGA 3 


TGAATAACCG 


TTTTCAGCGA 


CCTTGTTAAT 


2040 


CAGTTCAGCG 


AGATTAATAC 


CATCGACGTC 


AACGACAATG 


CGCCCCATAT 


TCAGTGCCTG 


2100 
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TACGTTAACG CTGTCGGCTT 

TTACCCAGGA TAATATTATT 

CTGCGAGCCC GGAGGATATC 

CCCGGCTGGT CGGTACGCAC 

AGTGAAATGC CTGCCTCAAT 

TTCAGTGTGA GGCCGTAGTG 

ACAGGAGACG GGGGAGAACT 

AAGATTTTCC TGATATGCAG 

GTAGCAGAAG GCTATTCTGG 

ATAAACGTAA CCACAACTGC 

CGTGACGGTA TGCTGATGGC 

TTTCATCAGC AGCGGAAAGG 

TCTGATACCG GCGGGGTCGG 

CCCGAAACAG GGCGTCACGG 

GGGAAGTGTT GTCCCGGGGA 

AGAATGCTCC GGTATTGGCG 

CGCTCATCAG CCGGACATAC 

GCACACCGGG AATATCCTGT 

GGTACTTACC GGTGCAGAAC 

TGGCTTCACC ACGGGTGATG 

TGAAAACAGG TGTGATGTGG 

GGAGAAAACC TGACTGCCAC 

CAGTGCCTGT ACCAGACGTT 

GGACGGGTGA TTGTGTGCCA 

CACTTCCCGG GGATGGACTT 

AATCAGCTGA TTCTGGTTGT 

CATCTTCAGA ATCAGCCATT 

ATGAAGATGG CGGTCCAGGG 

CATCTCTCCG GGC AGAAAGG 

GCGCATAATG GCGCTGCATT 

GATCGTCATG GCATAACACT 

GGCAATGCCG GCGGCTTCCG 



-198- 

CCGGCGTCAG GGAAAGTTTC ATTGTTTCAC CTCCGGGTGC 2160 

TACCGCTCTG TAATTGTCGC GGGTCATCAG GCCGGTCGCC 2220 

GATGCTGTTT ATTAACTGAG AGCGGGTACA GGCGCTGAAT 228 0 

CAGCGCGTAT TTTTCCACGA GAAAGTTCAC CGCATCACAC 234 0 

ATGCTGCTCG ATCACACGTT CATCGGCAAA CGGTGTGTCA 24 00 

CTGGTCCAGC AGTCGGGACA GAAGTATCTG CCAGATTTCA 2 4 60 

GGCCGCCTGC CCGGGTAATA CAGGTAATGT TTTCATACTG 2 520 

ATATAAAAAT GGGAAAGTGG CGTGGTGAAA ACACCAGGCC 2 58 0 

AGAGTTAATT TTTCATTTCG GGCGTCGGAT AAACAGCCAG 2 64 0 

TGAGGGTATC GGCTTTGCAG GTCAGCCCTT TTGCATACAG 2700 

GGGGATTCAG TTCACCGCTG GTGAGCATGA GTTCCAGTTG 2760 

CCTGGTCCAG GTGGTACGCA TCTGCATTGC TGTATAGGCC 2820 

CAAGGTAATG CAACCGGTTA CCCTCCTGCA CCAGACGTGC 2880 

TGCAGGGCAG CCCCCACCAG GGGCGGTCGT GATTGTCGTC 294 0 

GTGTGTCTGA CACGATAAAA TCCCTACAGA AAATCGGCTA 3000 

ATAATTCTGC TCATCAGAAT TCCCACTCAG TTCAGGGTGA 3060 

GGGCCAAAAC TGTCCTTACG GCGTTCAGCA AACACGGCCA 3120 

ACTTCACGAC CGGTATACGC CTCAGCACTG CCGTGCCAGC 318 0 

GGAAATAGAC GGGATGCAGG ATGCTGTTGG TGAATACGCA 324 0 

ATTTTCATAA TGGGATACCT CTGAAGACAG AAGATAAAAG 3300 

TTGTGACGGT GACGGGTTAA AGCAGACCGT GTTCCGCAAA 3360 

CAACTATCAG ATGGTCCGGT ACCCGGATAT CCACCAGGGC 3420 

CCGTGATAAG GCGGTCTGCC TTACTGGGGG TGACTTCACC 34 8 0 

GTACCACGGC GGCGGCATTG TGGTACAGGG CGCGTTTAAT 354 0 

CCGTGCGGTT GATGGTGCCG GTGAAGAGGG TTTCACCGGC 3600 

TCAGATACAG TACCCGGAAC TCTTCACGCT CCAGTCCCGC 3660 

CCCGTGCCGC ACGGGTGGAG GTGAAGGCCA CGCCGGGTTC 3720 

TTTTCAGGGC CCGCAGAATG AGACTGCGCT CGCCGGGCGT 3780 

AAAGTTGTTG CATTGTGCTT CTCTCCATTC AGTCGATGAT 384 0 

CCGGATGCTG CAGGGCGTAA TCCCGCAACC GGTAATAATG 3900 

CCGTACGACA GGCATGATGA CTGTACGTCA TCAGACAGGC 3960 

GGCTCATTTC AGCGCGGTTA CCGTTCATGG CATTGAACAG 4 02 0 
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TACCCAGTTT TCGTCATCAT CGTCATCCGG TTCGGGTGCC ATAAATGCCC CGCCGTTGTT 4 080 

CAGGGTGTAC AGATTCCAGA TACCACCGCA GTAGTCTTCG CACAGACGGT CCATCCAGCC 4 14 0 

GAAGACACGG GGCTCCAGGG TCACCCACTG TGGAATGAGG CCAAAGTGCT GCGGCCAGAA 4 2 00 

GCTGATGCGC TGTTCATCAG GGACTATGGT GGCAACCAGC TGAGGCTGGT CATTCCCTGA 4 2 60 

TGCAGCGGTT ACGGAAACAG AAG GAGTGGT GGAATTATGC AAGACGGTTG TCATGAGATT 4 320 

ATTCCTTATA AAAAGTAAAT GAATGGAAGA AACCCCGGGG GAAGGGACAG ACGTGAGTCA 4 380 

GAACTGCGCT TTCAGGGAAA CGGCATCAGC GCATACTCTC CAGCAGCGTT TCAGCCATCA 4 440 

CCCACAATGC GCGGTTGAGC TTAATGTCGG TGTCGATGCT GTGAATGGCA CGGGTATGGA 4 500 

TACGTTTTCC TGTGGCACTG CGACCGGAAA TTCCGCCTTT CAGCATATTC TCCTGAATGG 4 560 

TCTGATAAGC ACTCCACAGG TCCTTACCGT AATCCTCCCG GCGTCGTGGT GTCAGAATGT 4 620 

CGGCGGTGGT GACGGGCTGA TGTTCGTCAC CATAACGGTA AGTCAGTGCC GCCTGTGCCA 4 680 

GCGCCTGGCG TGCCGGTGGC GGCAGAATCA GCGACTGCAT GGCATCACGC TTTTCCTCAA 4 74 0 

TCCGGTCAAA AACCCCCACC ACCTCGTAAG CCCCTTCAAT AACTTTCTCC ACCACATTTC 4 800 

CCCGGTGCGG AACACGCACT TCCCCCAGAG ACTGACCACA GACGCATCCG TTCTGGCAGA 4 8 60 

CGAACCTGAA GTAACCCGGC AGCATCTGGT AGCTGGAGGT ACCGTCATGA GAGTTGAGCA 4 920 

GAATAATTTC AGGGACATGT TCTCCGTTTA TCTCTCCGGC CCGCCGCAGA CGCAGCATGT 4 98 0 

GTTTGGTGTA TTCCCGGCGG TCCGGGTCAC GTACGCGGGT CTGGCAGGCG AAGAATGGCT 50 4 0 

GAAAGCCTTC CCGCTGCAGG CTTTCCAGTA CGGTGATGGT GGGGATGTAC GTATAGCGTT 5100 

CACTGCGGGA GGTATGCCGG TCTTCACCGA AAATACCCGG TACATGGTGC ATCAGTTCTT 5160 

CGTGTGTCAG CGGACGGTCA CGGCGTATCT GGTTCGCATA ACCAAAACGA CTGGCTAGTC 5220 

GCATAATTTG CTCCTTATCG GTGGTTAAGA TTTACTGGTG TAATAAATGA AAAAGCCACG 5280 

TCTCCCGGAG AAGACGCGGC CTGACAGATG AAATGAATGA CGTTTATTGT CTGAGAAGCC 534 0 

CTTAACTGGC GAGCTGAGTA TTAAGCTGTG TTCCGGCATC ACCAGCGCAA CTGACCTTCA 54 00 

GCATTACGGA TAAGCAGCCG GGAATATGTT CCCTGGTCAT CTTCAGTAAA CACATTGCGG 54 60 

TAAGCTGTTA TGACAGCAAC CGCCTGCCCG TATGAGAAAG ATCCTTCAGC CAGGACATAC 5520 

TCTGTGTGTA ACCCGGCATA TCTGGTTTCT CCTGATAAAT AGCCTCTGCC ATACGTTGTG 5580 

GCAGAGGCTG AAGCATGAAA CTGACTTCAG GGATCAGTTA ACATTTTTTC CGGAAACGGT 5 64 0 

AATCAGCAGT GGATGGTAGT CCTGGGGATC GAAAACCGAT AACGGCAGAC TGACACGATG 57 00 

GCCGTTACTT TCTTCAGTTG CTTTAATGAT TTCGGTTGTG GCGACATTTT CCACGCACTC 57 60 

CGTTTCCAGA AATGCGTCTG TGGTTCGCGT GGCATTACTG TCACCAAAGG CTTCCGTTTC 5820 

CATTTTTCTG GTCACCAGCG TCTGACCATA TTTGTCTTTG AGTTGCAGAG TGATGGTGAG 58 8 
GGGGCCAAAT CCTTCATCGT TTGCGCCATT ATCCAGCCGG AACTGGTAAG CACAAATATT 594 
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-200- 

TCCCGGGAGC CATATCGTAT CTGTATTGCG TATACTGATG TAACGTTGAT CCTGTGCCCG 6000 

GAGTGGGGCA GACCACGTTA ACCCCAGAAT GAAGGCGGTA ATCATGCAGG TTTTGAACAG 6060 

GTGAATCATG GTATTTACCT CTCTGAGTCA TGACGATTAC ACTGACAAAT CAGGTGATAA 6120 

AACGTAAAAG GCGCAGAATA GCCGTTATGC CGGTAACTCC GGGGGTAATG TTTCTTCCAG 618 0 

TCGGTTAACC ATATTGCCGA GATGGGATGC ATCATATTCC ATGACGGGGC GTTGCCTGAT 624 0 

GATACTGACC ACCAGTGGTT TGATTAACAT GTTGGTCGCG GCCCGTTGTT GTATACCGGC 6300 

GGCGAAAATG ATC 6313 
(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 432 base pairs 

( B ) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 

CGTTGGCCGC TTGCGCAGAT AAAAGCGCGG ATATTCAGAC GCCAGCACCG GCTGCAAATA 60 

CGTCTATTTC AGCAACACAA CAACCAGCTA TCCAGCAACC GAATGTCTCC GGTACCGTCT 120 

GGATCCGTCA GAAAGTCGCA CTGCCGCCTG ATGCTGTGCT GACCGTGACA CTTTCTGACG 180 

CGTCGTTAGC CGATGCACCG TCAAAAGTGT GGCGCAGAAA GCGGTGCGTA CTGAAGGTAA 24 0 

ACAGTCACCA TTCAGCTTTG TTCTGTCATT TAACCCGGCA GATGTTCAGC CGAACGCGCG 300 

TATTCTGTTG AGTGCGGCGA TTACCGTGAA TGACAAACTG GTATTTATCA CCGATACCGT 360 

TCAGCCGGTG ATCAACCAGG GCGGAACTAA AGCCGACCTG ACATTGGTGC CGGTACAGCA 4 20 

AACCGCCGTG CC 432 
(2) INFORMATION FOR SEQ I D NO : 84: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3494 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 

GGGCTGATTA CGATTTTATC AATCTGTCTA TAGAACATGA ACTGAATGAA GGAATAGCTG 60 

GCAGAGAGAG GTTATGCCGG ACTGGCGGAT AACCGGAACC GGTTGGCAGA GGTGGTTACC 120 

CGTAAATTGC AGGACAGCTT TTATATGAAC TTTCCTGGGA TGCGCTGAAC ACGGCATACA 180 

GTGAACACCC AGAGTGGTTT TCCGGGCTTG TCTCCGGGGA TGAGAATTAA AAAGTGGATT 24 0 

ATGCTGCTAT AGCGCGGCGT GATTTCCTGC AGGGATTTCC ATTTATAAGA ATACGCCGCT 30 0 
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TCGGGGAATC 


TCCGGTTCTC 


CTGAGAGTTA 


CGATTGTTTT 


TTTACTCAAA 


TGGACAACAC 


360 


CTGAACTGGA 


ACTTGTGTTG 


CATCCCTGAT 


TGTTAGTCTG 


CAGGAAACAT 


CTTTTTTACC 


420 


ATCAAAGGAT 


GACTGTTTTC 


CTTTCTCCCC 


TCCGTAAAAC 


ACAACTTCGA 


TCACATTTCT 


480 


GACATTTTTT 


CCAGATTTTA 


CATAACAGGA 


TTGTTTCTGT 


ATGTTTTTTA 


TCTGGTGTAA 


540 


ATTTCAGCAC 


TGACATTCCG 


CTTACGTTAA 


TTTACACTGA 


ATACCCGACG 


AGGAGAATAT 


600 


GCAGCACCGG 


CAGGATAACT 


TACTGGCGAG 


CAGAACGTCG 


TTGCCTGGTA 


TGGTTTCCGG 


660 


TCAGTGCGCA 


TTTAAGCTCC 


GCACTTTCTC 


TCCGGTGGCA 


CGCTATTTTT 


CCCTCCTCCC 


720 


CTGCCTTTGT 


ATTCTTTCGT 


TTTCGTCTCC 


GGCAGCCATG 


CTGTCTCCGG 


GTGAGCGCAG 


780 


TGCAATTCAG 


CAGCAACAGC 


AACAGTTGCT 


GG AT GAAAAC 


CAGCGCCAGC 


GTGATGCGCT 


840 


GAAGCGCAGT 


GCGCCGCTGA 


CTGTCATACC 


GTCTCCGGAA 


ATGTCTGCCG 


GTACTGAAGG 


900 


TCCCTGCTTT 


ACGGTGTCAC 


GCATTGTTGT 


CCGTGGGGCC 


ACCCGACTGA 


CGTCTGCAGA 


960 


AACCGACAGA 


CTGGTGGCAC 


CGTGGGTGAA 


TGAGTGTCTG 


AATATCACGG 


GGCTGACCGC 


1020 


GGTCACGGAT 


GCCGTGACGG 


ACAGCTATAT 


ACGCCGGGGA 


TATATCACCA 


GCCGGGCCTT 


1080 


TCTGACAGAG 


CAGGACCTTT 


CAGGGGGCGT 


ACTGCACATA 


ACGGTCATGG 


AAGGCAGGCT 


1140 


GCAGCAAATC 


CGGGCGGAAG 


GCGCTGACCT 


TCCTGCCCGC 


ACCCTGAAGA 


TGGTTTTCCC 


1200 


GGGAATGGAG 


GGGAAGGTTC 


TGAACCTGCG 


GGATATTGAG 


CAGGGGATGG 


AGCAGATTAA 


1260 


TCGTCTGCGT 


ACGGAGCCGG 


TACAGATTGA 


AATATCGCCC 


GGTGACCGTG 


AGGGATGGTC 


1320 


GGTGGTGACA 


CTGACGGCAT 


TGCCGGAATG 


GCCTGTCACA 


GGGAGTGTGG 


GCATCGACAA 


1380 


CAGCGGGCAG 


AAGAATACCG 


GTACGGGGCA 


GTTAAATGGT 


GTCCTTTCCT 


TTAATAATCC 


1440 


TCTGGGGCTG 


GCTGACAACT 


GGTTTGTCAG 


CGGGGGACGG 


AGCAGTGACT 


TTTCGGTGTC 


1500 


ACATGATGCG 


AGGAATTTTG 


CCGCGGGTGT 


CAGTCTGCCG 


TATGGCTATA 


CGCTGGTGGA 


1560 


TTACACGTAT 


TCATGGAGTG 


ACTATCTCAG 


CACGATTGAT 


AACCGGGGCT 


GGCGGTGGCG 


1620 


TTCCACGGGA 


GACCTGCAGA 


CTCACCGGCT 


GGGACTGTCG 


CATGTCCTGT 


TCCGTAACGG 


1680 


GGACATGAAG 


ACAGCACTGA 


CCGGAGCTGC 


AGCACCGCAT 


TATTCACAAT 


TATCTGGATG 


1740 


ATGTTCTGCT 


TCAGGGCAGC 


AGCCGTAAAC 


TCACTTCATT 


TTCTGTCGGG 


CTGAATCACA 


1800 


CACACAAGTT 


TCTGGGGGGT 


GTCGGAACAC 


TGAATCCGGT 


ATTCACACGG 


GGGATGCCCT 


1860 


GGTTCGGCGC 


AGAAAGCGAC 


r~* 7\ r~* r~ c r~ i\ A A A 


(oLjIjLjA^HL, 1 - i 


u L L 1 f\r\r\ I 


C ACTTCCGGA 


1920 


AATGGTCGGT 


GAGTGCCAGT 


TTTCAGCGCC 


CCGTCACGGA 


CAGGGTGTGG 


TGGCTGACCA 


1980 


GCGCTTATGC 


CCAGTGGTCA 


CCGGACCGTC 


TTCATGGTGT 


GGAACAACTG 


AGCCTCGGGG 


2040 


GCGAGAGTTC 


AGTGCGTGGC 


TTTAAGGAGC 


AGTATATCTC 


CGGTAATAAC 


GGTGGTTATC 


2100 


TGCGAAATGA 


GCTGTCCTGG 


TCTCTGTTCT 


CCCTGCCATA 


TGTGGGAACT 


GTCCGTGCAG 


2160 


TGACTGCACT 


GGACGGTGGC 


TGGCTGCACT 


CTGACAGAGA 


TGAGGCGTAC 


TCGTCCGGCA 


2220 
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CGCTGTGGGG 


TGCTGCTGCC 


GGGCTCAGCA 


CCACCAGTGG 


CCATGTTTCC 


GGTTCGTTCA 


2280 


CTGCCGGACT 


GCCTCTTGTT 


TACCCGGACT 


GGCTTGCCCC 


TGACCATCTC 


ACGGTTTACT 


2340 


GGCGCGTTGC 


CGTCGCGTTT 


TAAGGGATTA 


TTACCATGCA 


TCAGCCTCCC 


GTTCGCTTCA 


2400 


CTTACCGCCT 


GCTGAGTTAC 


CTTATCAGTA 


CGATTATCGC 


CGGGChGCCG 


TTGTTACCGG 


2460 


CTGTGGGGGC 


CGTCATCACC 


CCACAAAACG 


GGGCCGGAAT 


GGATAAAGCG 


GCAAATGGTG 


2520 


TGCCGGTCGT 


GAACATTGCC 


ACGCCGAACG 


GGGCCGGGAT 


TTCGCATAAC 


CGGTTTACGG 


2580 


ATTACAACGT 


CGGGAAGGAA 


GGGCTGATTC 


TCAATAATGC 


CACCGGTAAG 


CTTAATCCGA 


2640 


CGCAGCTTGG 


TGGACTGATA 


CAGAATAACC 


CGAACCTGAA 


AGCGGGCGGG 


GAAGCGAAGG 


2700 


GTATCATCAA 


CGAAGTGACC 


GGCGGTAACC 


GTTCACTGCT 


GCAGGGCTAT 


ACGGAAGTGG 


2760 


CCGGCAAAGC 


GGCGAATGTG 


ATGGTTGCCA 


ACCCGTATGG 


TATCACCTGT 


GACGGCTGTG 


2820 


GTTTTATCAA 


CACGCCGCAC 


GCGACGCTCA 


CCACAGGCAG 


ACCTGTGATG 


AATGCCGACG 


2880 


GCAGCCTGCA 


GGCGCTGGAG 


GTGACTGAAG 


GCAGTATCAC 


CATCAATGGC 


GCGGGCCTGG 


2940 


ACGGCACCCG 


GAGCGATGCC 


GTATCCATTA 


TTGCCCGTGC 


AACGGAAGTG 


AATGCCGCGC 


3000 


TTCATGCGAA 


GGATTTAACT 


GTCACTGCAG 


GCGCTAACCG 


GATAACTGCA 


GATGGTCGCG 


3060 


TCAGTGCCCT 


GAAGGGCGAA 


GGTGATGTGC 


CGAAAGTTGC 


CGTTGATACC 


GGCGCGCTCG 


3120 


GTGGAATGTA 


CGCCAGGCGT 


ATTCATCTGA 


CCTCCACTGA 


AAGTGGTGTC 


GGGGTTAATC 


3180 


TTGGTAACCT 


TTATGCCCGC 


GATGGCGATA 


TCACCCTGGA 


TGCCAGCGGC 


AGACTGACTG 


3240 


TCAACAACAG 


TCTCGCCACG 


GGGGCCGTCA 


CTGCAAAAGG 


TCAGGGCGTC 


ACCTTAACCG 


3300 


GCGACCATAA 


AGCGGGAGGT 


AACCTGAGCG 


TCACAGCCGG 


AGCGATATCG 


TTCTCAGCAA 


3360 


TGGAACGCTT 


AACAGCGACA 


AGGACCTCAG 


CCTNGACCGC 


CGGCGGCAGA 


AATTCACTCA 


3420 


ACAGAATGAA 


AAACTGACTG 


CCGGCCGGGA 


TGTAACGCTT 


GCCGCGAAAA 


AACATCACAC 


3480 


AGGGTTACCG 


GCCA 










3494 



(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9319 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 

GNCCCAAGCT TAGGTTCGCG GCCGCAGTAC TGGATCTATT GCCAGCTTCA CCGCCAGACT 60 

GTCAGTCAGT ACATCACCGT ATTTCTGCTG GCAGGTTGCC GGGCGGCTGC ACAGTCACTG 120 

ATCAGTTGCT TCTGCTGTGC CGTACTCAAC TCTTCGTACT TTTTGATAAT ACCGCCGCAG 180 

TCACCGCCTT TCGCCTGACA GGACTTCATT TCAGCAGAGC AGGCATCTAT CTGCTTATTG 24 0 
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CTCAGGTAGT TATTCTCAAC AACAACCACA GGGGATTAGA AGCCTTTTAG CCTGAAATAT 300 
TTTGCGAGAG CACATCCAAT ACCAATAAAT GAGCCAATCA CACATCCGAT AAACAAAACA 3 60 

TGCCGAATCT CTTTCAAACT AATATTTAAA TTACCTGTTA TCAACCACTC CACCAAAGAA 4 20 

AAAAACACAT CAATACATAG GAATGACACC ACT AT AG AAA GAAATGCGAT TATAAAAATA 4 80 

ATAAACAATT CTGATAAGTG CTGAGAATTG CCGCTCATTT TTTCACCTCC GGAATGTAAG 54 0 

ACTCAATCTT TTTACCTTCA TACTCAGAAG CAAAAGAAGC CGACACATCC CCAGCTATAC 600 
CAGGAATCCT ACTGGGTGTC ATTTCTTTTG ATAGCCCCAA TTCTCCTTTA ATATCGGTAT 660 
ATTTTTGAAG TGTTGGATTA AATTTCGGGT CCCAGCCGTC TTTTAACCAG TTAGCACCAC 720 
TATTAATGCC CCATGAAAGG CCTTTACCAA TGCCATATCC AATAGCAGAA CCAGCACCAT 780 
TGATCAACGC ACCAGATGTT GGGGCTTTTC CTTCGAGCCA GTTTCCTAAT GCTCCTCCAG 84 0 

TTGCATTCCA GCCAACTGTG CCTACAACTC CATTCCCTGC ACTAATCACA TTAACCCAAC 900 
CACCGATAAT CGCTGTTGTA GGATCTATAG TTCCATCCGT CAGATAGCTA ACACCTGCAT 960 

TAGCTCCTGC CCCTAATCCC CACATGGCCT GAGCACCGCC AGTAAGAGAG CTACACTACC 102 0 

AGTGGCCAAC GCTCCGGCAT ACGCTTTATT GACTGCTTCT CCTCGCTTAC AGGCTTCACC 108 0 

GCCTGGGGCA TCGTTACAGG AAAGTACATC TGCGCCATGC GTCTGAGCAG CTTTGCTCTG 114 0 

CTCGGACTCT GTGCCACCAA CCAGGTTATT CTCAGCAATG TTCTTCCCGA CACCAGCCCC 1200 

AGCAGCCGCG CCAGCCACAT CGCCACTGGC AATGCCGCCA GCCATACCCG CTGACAGCGT 12 60 

TGCCAGCGTG CTTACGGTTT GCTTCTGATC TTCTGTCAGT TTCGACGGAT CTACGTCCGG 1320 

ATAGAGGCTT TTCGCAATGG CTGACGAGAT CACTTCACCA GTACCCGCAC CAATTGCGCC 138 0 

TGCTGCCGCA CTGTTGCCCT GAAGGGCTGC TGTCACACCA CCGAGAATGG CATGGGCAAT 14 4 0 

GGCTTTTGCC GCTGTATTGT CATCAATACC CGCGTGATGA CCGATGATGT TCGCCAGCTC 1500 

CGGCGCCGAA GCTCCGGCCA GAGCACCTGC TAAATTACCC CCCGCCAGCC CCTGAAGTGC 15 60 

AGCCGTTGCA GCCTGGATAC CGCGCTGCAT ATCGCTGCCG GTACCATACT TTTCCTGTTC 1620 

CTTTTTGTAT TCCGGCGTAT CACGCAGTTT TGCCAGATAT GCCTGCCGCT GTTCTTCCGT 1680 

CGCATCCGCC GGAACAGGCC CATATTTATC CTGCGCAGCT TCAACGCATT CAGTTCCCCC 174 0 

TGCGTCCGCG CAATATCCGC CACCTGACTG CCTATGTCAC TGATAAGCCC CACTGTCTGC 1800 

AGACGCCTCT GCTCCTTCTC CTTGTCAAAT ATCGGGCTGA TACTGTCATT AGCGTGCGCA 18 60 

GGGTCACGGC TCAGGTTCGC CAGATTCTGC TTCTGATTGC CCCTGTCCCG GATGGTGATA 1920 

GTGCCTTCTG CCACTGCGGC CTGAGTCGTT CCTTCCGCAT GTCCGCTGTG ACCTCCGGCG 1980 

GATATCATGC CACCCGGCAT GTTACCCTGA AATTTATCCC CGAAGCTGCC ACCACCGCTC 204 0 

AGACTGATTC CACTGTGACT GACTTTATAA TCCGCTTCGT TGTGAAGGTC ACTGAACCCC 2100 
AGCGTTCCGG TATCCAGGTG GTTTTTATCC GGTGTGGCAG TGGAGGCAAT CACCGCACCA 2160 
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TCCAGTTGGG TATGTTTACC CACTGTGATG TCGAAGCCGC CGTCACCGGC AAACATTCCG 2220 

GTTTGTTCAG CAACGGAGTC AAAGCGGCTC TTCATCTTAT CCCGGGAGGC AGCGATGTAA 22 80 

CCTGAGCCGG TCATGGAGCC AAAGGTAAAA CTGCCGCCGG CASCCACGCT GGTCTGTTTA 2340 

CTGTCGTACT TACTGGTGTC CTGCTGGCTG CTTATCAGCA GGTCGTGGCC CACATCGGCG 2400 

ATAATCCTGT TGCCGTTGAC CTGAGCACCG TTCAGTACCG TATCCCGACC ACTGTTGATG 24 60 

GTGACGGTTT TACCGCTGTC TGTTGTGGTT TCAGTCCACT CAGTACCGTT ACCTTTCTCG 2520 

CTGCCTTTTG CCGCATTAAC GCTGGCAAAG ACACTGATAC CGGCACCTTT ACCTGCACCG 2580 

ATACTGACAC CCACGCCACC GCCA2TGCTG CTGTTCCTGC CCGTTGTTTT TTGTGTGTTT 264 0 

GCCGCGCCAC TCAACAGAAC ATCATTCGCA GCATCCAGGT TTGTGTTACC ACCGGCCTTA 2700 

AGCTGGCTTC CGGCAATCAC AATATCTCCG CGGTTATCGC CCCTGTTTTT ACCGGTTGCG 27 60 

ACAACAGACA GATTATTCCC GGCATTCAGC GTACTGCCGG ATACTGTGTC ACTTTCAGAA 2820 

TGTTGTTGTG ATTTCGATTT CTGGGTGGTG AGCGACAGGC TGACTCCCGT CGCATTCGGG 2 8 80 

TCACCGGTTG CGGAGGCGAT TGCCGCAGCC TGTCCGGCCT GCACACCAGA CAGCGCTGTC 2 94 0 

TTTGTAGCCT GCAGGGTTTT CAGACGGCTG TCACTGCTCT CCTTCGTCTC CTGTGCACTG 3000 

GTGACCGCAT TATTGATGGC ACTGCCCACT GTGCCGGAAA GGGCAACCGT CAGCCCGCTT 3060 

TTCTTCTGCT CAAATTTTTC GTCCACAGTA CGACGGTCAT GCCCCGGGTC AACCACCACA 3120 

CTGTCACCGG TAATGCTGAT ATCCCGGTTC GCAATCACAT CCGAACCGCT GATATGAGCC 3180 

TGTTTGCCCG CGGTAATACT GACATTACCG GCAGTGGAGC CGATGGTACT GGCACTCTGA 324 0 

CTCTGCGTTG TCCCGGCCTC GCGGCGGTCG TGCGTTGTCT TACTGCTGCC AATGGTGAAG 3 30 0 

CCAATACCGC CGGTACCCAT CAGACCGGAT TTCTTCGTTT CCTTAAAGCG CCAGGACGTA 3 360 

TCTGTACTGG TGGCAGCAAG AACATCAACA TGGTTACCCG CCGCCAGTGA CACATCCCGG 34 2 0 

TCAGCCACCA CATCCGAACC CTCTACCGTC AGGTTATCAC CGGCGTTAAC GGTCACGCGG 34 8 0 

TTCCCCGACA GCAGGGAACC TGYTTCACGG GAGGCACTGT CCTCACTGAT GGTGTGGGTG 354 0 

GTTTTCTTAC TGAGAAAACC TCCGCTTTTT TTCTTCGTTT CCAGATAGTG ATAGTCACTT 3 600 

TCTGTCGCCG TGGTCAGGGC AACATCACGA CCGGCATTCA CGCTGATATT GCCGGTTGCG 3660 

GTAACGGATG ACGCAACAGC GGTGATATCC CGTCCTGCGG TGACGGTGGT GTCACCACCK 372 0 

CTGGCGATTT CCGTTCCCTG CTGACGGACT GTCTCGTTAA TCTCTTTCTT TTTCTTCGAC 3780 

GTATAGCTGT CGCCTGCGCC GGCAGACTCT GCCACCAGGT TCACATCACG TCCGCCCCGG 384 0 

ATGACCACGT TATTTTCCGC AGCCATACCG GCAGCCTGAC TGGCAATATC ACGACCGGCA 3900 

ACAAGGAGGA GGTTATCGCC CGCCGTCACC GTGGACACAG CTGCGTGGCT TTCATGACTT 3 960 

TCTGACCTGC CGTTGCGACT GTTTTTGCTT TCCCTGACTG CATTCAGACT CAGGTCGTTA 4 02 0 

CCTGCAGAAA GCAGGGCGCT GTGCCCGGCA GAAACAGAGG ATGCTGTGAC ATCCAGATTA 4 08 0 
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TGGCCTGCAG 


CCATCGCCAG 


GTTACCGCCG 


GCGCTGATGC 


TGCTGCCCTG 


TGAGGTGGTG 


4140 


GATGATGAAC 


TGTTGTCATC 


AGTGTGCCAG 


AAACCGGACT 


GACTTTTGCT 


CCCGCTTATC 


4200 


AGGTTTACGG 


CAATGTTGAT 


GTCATTACCC 


GCAGACATTC 


CAAGGTCTCC 


ACCGGACGAG 


4260 


ACCGTTGCCC 


CGGTAATATC 


AATGTTTTTC 


CCTGCATCCA 


GTGAAAGTGA 


ATCAGTGCCT 


4320 


TTAATGGTCG 


CAACCGGACC 


GGTGTCCGTA 


CCGCTGAGAT 


GCACACCACC 


ATATCGGCTG 


4380 


TCACTGCCCG 


CATTCCATTG 


CTGACGCCGG 


GTGATATTGC 


TGATGTTGCC 


ACTCACGGTT 


4 4 4 0 


TCCAGTTGTA 


CGGTTTTACC 


GCTGATGACT 


GAGCTGATAT 


TGCTGATATC 


CCCGATGGCG 


4500 


CTCAGGTCCA 


GGCTACCGCC 


CGCGCTTATC 


AGCCCTGCAT 


TCAGGTTGTC 


GATATAGGCG 


4560 


GTACTGTCGA 


GCGAAAGGTC 


GTTCTGTGCG 


TTGATGCTGC 


CGCCGCTGTT 


GGTGATATTG 


4620 


CCGTGCGCAA 


GCTGCACGTT 


GTTCCCGCTG 


ATAACGCTGC 


CGTTATGCAG 


GGTGATATCT 


4680 


TCCGGCGACA 


GATACAGTTT 


CGGGACCATG 


ACTGTCTGTC 


CGTTGATGGT 


GACTGACTCC 


4 74 0 


CACCACAGCA 


TGCTGCCGTC 


AAGCTGAGCA 


ATCTGTTCAG 


CTGTCAGCGC 


CACACCAAAC 


4800 


TCTAATCCCA 


GTCCTTTCTG 


TTGTCTGGCC 


GCGTTATCCA 


TCAGATACCG 


CATCTGTTCC 


4860 


GTGTCTGAAC 


CCAGTCCGTT 


GAGATAACGT 


GAACCCGTCC 


GGCTCAGCAC 


GGCGTTACTG 


4920 


ACATACCGGG 


TATCAAAGAC 


CGCATCCCCC 


AGGAAACGAT 


AATGTTTTTC 


CGGTTTCAGC 


4980 


CCGAGGCGGT 


CAAGAAAATA 


CGATGAGCCC 


AGAAACTGTT 


TTTCATCGGT 


ATACGACGGA 


5040 


GCCGTTTCAC 


GTGGCGCCTG 


ACCCGGTTTC 


GCTCCAAGAA 


GCTCATACAG 


TCCGGCAAAC 


5100 


AAATGGCTGT 


CCACCTGTCC 


GAGACCATCC 


AGTTTCGGGT 


TCACCGTAAT 


CAGATACGGA 


5160 


CTGTCCGGGT 


CCGTGGACGG 


AACCAGGTAT 


CCATTGTTGC 


CGGAAGGCAG 


TGGCCAGTCA 


5220 


TCACTGATAC 


CGGTCTGACC 


GGTCAGTGGC 


GAACCTCCGG 


CAATATTTTT 


CAGGGCACCT 


5280 


GCCAGTTCAT 


CGTGCCATTG 


CGGAGAGCCA 


ACCACCACCG 


GCTCATACTG 


CTGCAGCGCT 


5340 


GTCTGTGTCA 


GACTGTCTCC 


GCCGGTCTGC 


TGACTTAACG 


TATTCAGTAC 


AGGTGCAGAG 


5400 


ACCACCGGAC 


TGACACTACC 


TGCATGTGCA 


GTGGTTGTTC 


CGTTATTGAT 


ACTGCTGGTA 


5460 


AAACGGGTCT 


TAACATCCCC 


GCCCGCCTGA 


ATAACGGAAT 


AATACGTCTT 


ACCGGGCGTG 


5520 


TAATCTTTTT 


CCCGGCCATC 


CAGTGAAAAT 


CTGATGGTAT 


TGTTTTCAAA 


TTCCGGTGAC 


5580 


AGCAGGGGCA 


GTTTATCCAG 


AGAGCCTGTT 


GCATAGCTAC 


CGTAAAACGT 


TTTCGGGTCG 


5640 






ATTCTCTGTC 


CCCGTCTGCC 


AGCTCTGATT 


GCTTAACTCT 


5700 


CTGCCCGAGA 


GTGCGATATC 


CCCATTCGCC 


AGGATAAATG 


ACGCCCGGTT 


TTCCAGTCGT 


5760 


TCAGCCTCAG 


CAGAAAGATT 


ACGCCCTGAC 


GCAATGCGGC 


CTGCCGGATT 


ATCAGCACCG 


5820 


GTTACTGTTG 


TGATGTTCT 3 


GCTGCTGAGA 


AAGCGCTGTG 


TGGCACTGTC 


AGCAAACGGA 


5880 


GCGTAATAAT 


AAAGCGTATC 


CATTGTGATA 


TTGCATGCCC 


CGTGCCCGTT 


GCAGGGCGTA 


5940 


CCGTGCTGAT 


TTTCAACTTC 


ACGGGTGAAA 


TAGCCATAGC 


TGGCGTCAGG 


AAGAAGGGAA 


6000 
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AGGGGAATAT CAACCAGAGC ATTTCCCATT CCCTGAATGG ATGAGGGGTT AGTGCGGGTT 60 60 

GTTGTTGTGG CAGAAAATCC CTCCCGCTGG TTCAGAAGAT GC2CGGTTCT TACAACAATA 6120 

TCGCCCTGAT GGGTCTCAAT ATTCCCGGAA GTATTGATAA TCTCTGTGTT TGGACCGCCG 6180 

GAAGCATCCT TCTGTACCCA CAGACTGTTG CCGGCCAGGA TATCACCATG CTGGTTATGC 62 4 0 

AGACGGTCTG TAAACAGCTT CAGGTTATTC CCCGCATAAA TCAGCGCACT GTTCAGCAGG 6300 

GTACCGGCCA CATTCATTGT CAGACTGCCT GCCGTGCCGG T AAAAC C AC T GATGGTGATA 63 60 

TCACTCCGGC TGTTCAGACT CACATCGCCA CCGGCCTGAA GTGAACCCGG TGGGTTAAGG 6420 

AAAAGACGCT GTGCGCTGAA AACACTGTTG CCTTTACCGG CAGTCAGCGT TCCATTGTTG 64 8 0 

GTGAATGCCT CTCCGGCACC GAGCACCATG GCATCACCCT GCATGACACC GCCGTTGGTG 6540 

ATGGCATTTT GCGACGTGAC GGAAAGGGTT TTCCCTGCGG CGAGGGTACC GTAATTCGTG 6600 

AGGGCAGCAA TCAGTTTCAG TGTGACATCA CCGGTGGCCA CCACCTGCCC CTGACCACTG 6660 

AAGTCCTGAG CGTCAAGCAG CAGGTTGCCT GCACTGTACA GCCGCCCTGT ACCATTTTGC 6720 

AGCAGTGAAC TGCCCTTGAC GCCAAGCCCG GAGGTTCCCA GCAGGGTACC GCTGTTGCTG 678 0 

AATGTGTGGT AATTCACCAG CAGGTCCGCA CCCTGAAGGG TACCGGTATT ATTCAGCGTG 684 0 

GTTCCTTTAA CGTCGGCACT GCCGGTGGCA AGTACGCGTC CGCCGTTGAC AGTATTCACC 690 0 

AC AT C C AG C A GCAGGGTGGC AGCCTGTACC AGTCCGCTGC CGGTGTTCGC CAGCACCTGC 6960 

GCCGTCAGCG TGAGGTTACT GCCGGAGAGG ATTTTGCCGT CGTTCTGCAG ACGGTCAGTG 702 0 

GCGTTCAGGG AAACCCCGCC ACCACCCTGT ATCGTGCCCT GGTTACTCAG GGTCGCAGTA 7 08 0 

CTGACATTCA GTGCATTCCG GCTCATCAGA ACACCACCGG AACGGTTGTT CACGCCACCG 714 0 

GAGGCGGCCA GCGTCAGCGT TTCGCCCTGC AGATGCCCGC CGTTTGTGAG TTGTCCTGCC 7 20 0 

GTGATGGTGG TGGCATTTCC CTGTAATTGC CCGTCGTTTG TGACACTGTC TGCCTTCAGC 72 60 

GTCAGCACAC CTGCACTGAG CAGTTTTCCG CTCGCGTGAT TGTGCAGCGT CTGATTCACC 7 32 0 

GTGAGCGTGA GAGCATCCAC ACCGGTGATG TCACCCGCAC TGGTCAGTGA GTTCGCCTTC 738 0 

AGGGTCAGAT TTTTTGCAAT CCATTGTCCG CTGTTGCTTA AATTCAGTGC ACTGAGCGCC 74 4 0 

ATTTCACCGT TCGAGGTGAC TTTGCTGCCT GCTGTGCTGA CGAGCTCACC CGTCAGACGT 7500 

GCAGTCAGGC TGTCAGCCGC CTGGATCGCC CCGCTGTTTG CCAGACTGTC TGCGGTGATC 7560 

AGCACCCGTT TGCCCTGCCA GTGTCCGGAA CTGGTAATAC TGCCTGCGGT GATTGTCAGA 7 62 0 

TCGCCGCTGG TCAGCAATGA ACCTCCGTTA TTCATCAGCG CAGGTTGAGG GGATGCCATA 7 68 0 

CGGGCGGCAA GCGTCAGCGC GGCTATCCCG GTGAGCGTGC CACTGTTGGT GACACTGTTC 77 4 0 

TGGCGAATCG TGACATGGTT ACCCTGGACA GTGCCGCTGT TATCCAGTGA GTTTCCATCA 7 8 00 

AGGGAGAGCG TGCCGGCCGA AAGCAGACTG CCCCGGTTGT CCATGGTGGC TGCTTTCAGC 7 8 60 

GTGGTGTCAC CCTGGCTCAT GATATCGCCG GTACTGGTCA ACTGACCGGT TGCCGAAGCA 7 92 0 

JNSDOCID- <WO_ 9B22575A2. I_> 



WO 98/22575 



PCT/US97/21347 



-207- 



GTAAGGTTAC 


CGGTTGCCAG 


CACGGAACCA 


CTGTTCGCCC 


AGTTGTCCCG 


CYTGCACGGT 


7980 


GAGATTCTGT 


CCCTGCGTGG 


TCCTGCGGTA 


TGCAGTGTTT 


TACCCCGGAG 


GGTGAGGTCG 


8040 


CCCGCCGTCA 


GCCAGCGCCC 


GTTACTACCC 


TGTGAGAGGG 


TGTGGCCAGC 


AAGCGCCAGT 


8100 


GCACCGGCGC 


CCTGCAACAG 


GCCGTCACCA 


TCCAGCGTGG 


TCGGGCTGAC 


GCTCAGCGTG 


81 60 


TCAGCGATGA 


TTTTTCCCGG 


ATTGCTGAGG 


GAGACAGCAT 


TTAACATTAA 


ACCATTATCA 


8220 


CCGGTGATAA 


GCCCGCTGTT 


GCGGATGTCC 


GGTATATCCA 


GCGTCAGGTG 


TGGAGCACTG 


8280 


TACAGCGTGC 


CGTTCTGCTG 


ATTATCAAGC 


CTCTGTGTGT 


TAAGGGTAAG 


TGAGGCCTCC 


8340 


CCCTGCAACA 


GACCGCTGTT 


GGTCAGGGTG 


TGTGACTGTG 


TATTCAGGGC 


GGAACCAACA 


8400 


AGTACGCCGC 


TGCTGGTCAG 


TTCCGGCGCA 


CTGAGGCTGA 


GCGACGGGGC 


ACTGGTTTTC 


8460 


CCGCTGTGGG 


TGAGCTTTTC 


ACTGGCGTTC 


ACCACCATGG 


TCTGTTGTGC 


TGCCTGCGTA 


8520 


CCTGCAAGAC 


GTGCATCTCT 


GGCGTTGATG 


CTGAGATTTT 


TACCGCTCTG 


AAGGTGTGCG 


8580 


CCCGCTGCGG 


TACTCAGTTT 


GTCTGCCTGA 


ACCCGGAGGG 


TGTGACCGGG 


ACTGTTTTCC 


8640 


CCGTCCAGCG 


CCACTGTTGT 


CACATTCAGC 


GTCATCGCAG 


CATCGCTGTG 


GGTGACCGAT 


8700 


TTTTTACCGG 


AGCTCAGCGC 


CTGCGCACT G 


ACCGTCAGCC 


CTTTGCCGCC 


GGACAGCACA 


8760 


CCGTTCTGTG 


TCACATCCTG 


CGCCTTCAGG 


ACCAGTACAT 


CATCGCTCAC 


CAGCGAACCT 


8820 


GTACTGGTCA 


GTTTCCCACT 


GGCCGTGATA 


TCCACTTTGC 


CCTTCGCGCC 


AGTGCGGCCG 


8880 


CTCTGGGTAA 


AGTCGCGGGT 


ATTCACGGTC 


AGGGGACCGC 


CACTGAGCAG 


GGAGCCACTG 


8940 


TTGCTGAGCG 


TTGTACTGCC 


GAGCGTCAGG 


GAAGCCCCCT 


GAACAGCACC 


ACTGTTATTC 


9000 




^ M. 1 V_ oMo 1 LL 






ATATTCCGTC 


"TGTGTCAGC 


9060 


GTGGTGGCGC 


TGGCCGTGAG 


ATTCTGCGCG 


GCGGTTATCT 


GTCCCTGTGT 


TGTCAGCGTG 


9120 


TCACTGGCGA 


CAGTCACGAT 


ATCGCGGGCC 


GCGTTAATCT 


GGCTGGCGGT 


ATCCTGTGTG 


9180 


ATGTTTTTCG 


CGGCAAGCGT 


TACATCCCGG 


CCGGCAGTCA 


GTTTTTCATT 


CTGTTGAGTG 


9240 


ATTCTGCCGC 


CGGCGGTCAG 


GCTGAGGTCC 


TTGTCGCTGT 


TAAGCGTTCC 


ATTGCTGAGA 


9300 


ACGATAATCG 


CTCCGGGCT 










9319 



(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 551 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
ATGAGGCGAT TAAAGCAACA TTGGGCAGTG ATAATGCCCC CACCCAGCCA CCTAACGCAG 60 
CGAAGAGTAA TACATCGCCC ATGCCTAATG CTTCTTTACG CAGAACTATT CCGGCTATCC 120 
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AGCGSAGGGA G T AAAAA G T G ATAAATCCCA CCAGTACGCC GGTAACTGCG TCTTGTAGCG 180 

TTAACGGACT CTGTTGCGCC CATGCTGCAA TCAGCCCGGT CCACAATACG CCCTGAGTAA 24 0 

AAACATCGGG CAGCCATTGG TTGTCGAGGT CAATGACGCT CGCGGCAATC AGCCAGGCGG 300 

ATAATATCAT CACCGCCAGC CCCCATCCAC TTTCTGGCCA CACCAGACTC GCCAGCAAAA 3 60 

AAGTGAGTGC TGTCAATAAC TCAACCAGCG GATAAGGTTG GTGATTTTCG GCTGACAGTC 4 20 

GCGGCAGCCC TTTGAGCATC AACCATGAGA GCAGCGGAAT ATTGTCACGA ACGCGGATGG 4 80 

TCTGCTGGCA ATGCGGGACA GTTGCGAACC GGGTTAGCCA AGGGCTTTAT TTTTTGGACT 540 

GCGGCACTCG G 551 



(2) INFORMATION FOR SEQ I D NO : 87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 595 base pairs 

(B) TYPE: nucleic acid 
(CJ STRANDEDNESS : double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 

CATTTACCAA ACCCCGTTCG AATATCTTAT CTATTGCCCA TCTCATATTA AATATAACCG 60 

ATAATTTGGT GGATACTAAT AGTAATTACC TTGTTATTGA AAATATAATT ATTGTTATTT 120 

TTAGCCTCAT TAATTAAATT GAAAAATCCT CTCTAATTTT TGTCAGATTA GGGCTGTAGA 180 

AAGGATCGAG TTCAAGATGT TTACCCCATT TGCTTTTCAT AAAGTCCACT TCCCTGGCAA 24 0 

ATCTGGCTAG TTTCTCCGGT GAATCTTCGG CTCCTCGACT AATCGATTCA TAGTGGTAAA 300 

GCTCGGCATA AGGTGTCCAG AGATTACGAT ACCCCGCTTC GNGTACTTTC AGACAGAAGT 3 60 

CCACATCATT AAAAGCAACA TGCAGATTCT CTTCATCCAA CCCGGCAACT TCCTCATAAA 4 20 

TATCTTTGCG AATAAGCAGG CAAGCCGCCG TGACGGCCGA GAGAGTTTGT GTCAACAACA 4 80 

AACGGCTGAA ATAGCCCGGA TGGTGGCGAG GATAATGTTT ATGGGAGTGT CCAGCTACAC 54 0 

CACCAATACC GAGAATCACT CCGCCATGTT GTAAAAGTAT CATTACTGTN ATAGG 5 95 
(2) INFORMATION FOR SEQ I D NO : 88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 399 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 88: 
TGGCAGTTGA ACAGATTTTC ACATCAGCAA CAGATTAGCG AACGGGACTT GGCATTAGCC 60 
GAGCGTTTTA GTGAANGTTT AGCTCTAACA CGTCTATTAG AAGAGCGCAC GCAGNATTAT 120 
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CACTGAACTA GAGATTGAAA AACAATTGCT TACCACCAAG TTGTCTGGCG TAGAGCAGCA 180 

GTTAAGGGGT GAGCAAGAGT CGCTTCAGCA GGCCCAGTCT GGATTGCTCT CAGCAGCAAA 24 0 

AGAAAAGCAA CATCAACTTG ATGAGTTGGA ATCGGTGCTC AATGAGCGGT ACAGTGAGAT 300 

TGCAACCTTA ACCCGTTGGC TGGAAGAACG TGATCAGGCA GTCCTTAGTG CAGCAAGTGA 3 60 

ACAACAACAG ACCAATGANA CCATATAGAG CTCAGCCAG 3 99 



(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1013 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 



A 1 AL 1L lbL 1 


loll Lj/A^^M^j 




n, C T T T fi T c A r" 


GCAATATTAG 


ACT^GTGCAC 


60 


TGCTATTAGT 


TGAGTCAGTT 


CATCACATTG 


TTTAGAAGCC 


GCAGCCAAAG 


CAAGAGTTTG 


120 


CTCATCTATG 


CTTTGCTGCA 


ATGTTTGTTG 


CACAAGTTGC 


CCTTCTTCCA 


GCTGTTGCTG 


180 


TAGATTTGCA 


CTTACCTTTT 


TCAGTGCATC 


ATATTCCAAG 


CCTAACGTAT 


CGTGCTGTGC 


240 


TTCCAGTAAT 


CCATAAGCAT 


GCTGCAACTG 


GTTTTTAGTT 


TGCTGCTCAC 


CGTCAAGCTG 


300 


TTGCTGCAAT 


GCATTAGCCT 


GCTGTTGCAA 


CAAGTTCACC 


ATATTGTCTC 


GCTCGGCCAG 


360 


TGTACGAACC 


TGTGTATCCT 


GGATATGTAG 


CGCTTGTTCC 


AACTGAAGCT 


GTAATTCGGT 


420 


AATTTGCCGC 


GAATGTTCGC 


TCAATGCTCT 


GTTGCTCTTG 


CTGAGCGCGA 


GAGTAAGGTG 


480 


AGATGCACGC 


TGTGTTTCTT 


CACTCAATTG 


TAACGTCAGG 


GTATTGACCT 


GTTGCTCCAG 


540 


TTGATGGCGA 


GCTTGCTCCT 


GGCTCGTGAT 


GCGACTCTGT 


TGCTGCTCTA 


GTTGATGCAG 


600 


AGCTGTATGC 


AACTCATCGT 


TGGCTTGTAT 


TCGCTCCTGC 


GACCATACAC 


TCAAGTTTGT 


660 


TTGGGCCTCA 


TTGAGCTGTT 


CTTGCAATAA 


TGCCACCTCA 


GATGTCAGCG 


AATTGATATG 


720 


TTGCTGGGCA 


AAAGATAGCT 


CATCAGATTG 


CACTTGAGCA 


TGTGCAAGCT 


GCTTTTCCAT 


780 


TTCTAATATG 


CTGTTATGTT 


GTGCAGTAAT 


GCGCTCGGCA 


AGACGCCCCC 


TTTCCAATGC 


840 


CTGCTGTTCT 


ACCAATAGCT 


GCCGTTCAGC 


CTGAATGTCA 


TCTTGTTGTG 


TAGACAACTG 


900 


ACGTTTTAAC 


TGGGAATTCT 


CCCAACTCTC 


GCTACAAGAT 


TTNCCCAAAC 


GACAAAAGAT 


960 


GTCTTGGACT 


TGTNTGGGTT 


ACACGAGCAT 


TTTCTGAGGA 


TTTTATACCA 


ATN 


1013 



(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 689 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 

GATATCCACA TCGAGACGTT TGAAAAGAGT CTGGTGATCC GTTTTCGTGT TGACGGCACA 60 

TTACATGAAA TGCTGCGTCC GGGGCGCAAA CTGGCCTCGC TGCTGGTGTC GCGTATCAAG 120 

GTGATGGCGC GGCTGGACAT TGCCGAAAAG CGCGTGCCGC AG3ATGGACG TATTGCGCTG 180 

TTGCTGGGCG GCCGGGCGAT TGACGTGCGT GTATCAACCA TGCCTTCCGC CTGGGGGGAA 240 

CGGGTGGTGC TGCGACTGCT GGACAAAAAC CAGCCTCGCC TGACGCTGGA GCGTCTGGGT 300 

TTAAGTCTCG AACTGACTGC GCAGTTGCGC CACTGTTACA CAAACCGCAC GGCATTTTTC 3 60 

TGGTGACGGG GCCGACCGGT TCCGGCAAAA GCACCACGCT GTACGCTGGA TTGCAGGAGC 420 

TGAACAACCA CTCGCGTAAC ATTCTCACGG TTGAAGACCC TATCGAATAC ATGATTGAAG 4 80 

GGATCGGTCA GACGCAGGTT AACACCCGCG TCGGCATGAC ATTCGCCCGT GGCCTGCGCG 54 0 

CAATTTTGCG TCAGGACCCG GATGTGGTGA TGGTCSGTGA AATCCGCGAT ACCGAAACCG 600 

CAGAAATCGC TGTTCAGGCT TCAACTGGAC CGGACACCTG GGNACTTTCN ACGCTGGNAT 660 

ACCAAAAAAA AGGGGTGGGG GGATTATAC 68 9 



(2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1281 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 



CTCAGCAGAA 


CCGAGATCTT 


CCATCAGCTG 


GCGGGCCTCG 


GAAGANTCCC 


GCTGCCAGAC 


60 


CGCATTCAGC 


CGCTGTTCAA 


ATTCGGCCTC 


GTCGATTTGC 


CTCAGCGTAA 


AGGGCGCGTT 


120 


CAGCCCCCGT 


TGCAGCTCCT 


GCAAAACAGA 


GAGCGACAAC 


GGATGCACAT 


GGAGGATCTC 


180 


CAGCGACGCT 


TCGCACCATG 


CCACCAGGCT 


AAACCGACGG 


CTGAAACTAT 


AGGGCAGACG 


240 


CACGGTGTTA 


GCGGTGGTTT 


CCTGTGCTAC 


AGGCACCATT 


AACGCGTTCT 


CCCGGCATTA 


300 


AGGAACGCAC 


GAACTTCTGG 


CGGTAAGGCC 


TGATTTTGCG 


CAGGCAATAT 


CGCTGCGCAG 


360 


TGTGCGGCAT 


CAGGCTTAAG 


CCCTGCTCAT 


CGCGGTAGAT 


TTGCTCGGCG 


CGCATGTAGT 


420 


TATATTTGCG 


CTGCGACACA 


CCGTCTGCCG 


CCATACCGTC 


ACGCAGAATG 


GTCGGGCGGA 


480 


TAAACACCAT 


CAGGTTACGT 




TATCCGCCGT 


CGATTTAAAC 


AGGTTACCAA 


540 


TCAACGGGAT 


ATCGCCCAGC 


AGCGGCACTT 


CTCGCCACGC 


TTTCTCCCGC 


CTGGTCGTCC 


600 


ATCAGACCGC 


CAAGCACAAT 


TAGCTCACCA 


TCGTTAGCCA 


ACACGGTGGT 


TTTCAGTTTG 


660 


CGCTCACCAA 


ACACCACGTC 


GAGGCTGGTC 


TGTCCTTCCA 


CCTTCGACAC 


TTCCTGCTCA 


720 



WO 98/22575 PCT/DS97/21347 

-211- 



ATCACCATCT 


GTACCGCGTT 


TCCTTCGTTA 


ATCTGCGGCG 


TGACTTTCAG 


CATGATGCCG 


780 


ACTTTTTTCC 


TCTCTACCGT 


GTTGAAAGGA 


TTGCTGTTAT 


TGGAGCCAAC 


GGTAGATCCA 


840 


GTTAATACCG 


GAAGGTCCTG 


GCCCACCATG 


AAGAAGGCTT 


CCTGGTTGTC 


GAGCGTGGTG 


900 


ATGCTCGGCG 


TGGAGAGCAG 


GTTGGAGCTG 


GAGTGGTTTT 


TGACCGCCTG 


TACCAGCGCC 


960 


ATCCAGTCGC 


CTTTCAMCAC 


GCCAACGGCC 


GTACCGCTAA 


AGCCAGAAAG 


AAGCTGAGCA 


1020 


AGCGTGGAGA 


GATCGCCGTT 


AGTATCCGGA 


TTTATGGTGG 


TAGGGCCGTT 


TTCACTGATC 


1080 


ACCGTGGAGC 


CTTTCTGCGG 


TTTTGCYTGA 


GAAATCGTGC 


GCCCAGCGTA 


CCAATAGGGA 


1140 


TCTGCGTAGG 


GTTAGCAAAC 


TGCATTAATC 


CGGCATCTTT 


CGACGCCCAC 


TGCACGCCGA 


1200 


AATTGATAAT 


TCACCTTCGG 


CAACTTGCAC 


GATCAACGCC 


TCGACATGTA 


CCTGAGCACG 


1260 


GCGAATATCC 


AGTTGTTCAA 


T 








1281 



(2) INFORMATION FOR SEQ ID NO: 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 421 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 

CAATATTAGC GCACGGCACC AAAGGTGATG AATGAGCAGG CTGRAATATT ATTTTCCCGC 60 

GGTGCAGAAA TCCTTGTTCT TGGTTGTACA GAAATTCCGG TTATTCTGGC GCAACGTTAA 120 

AGAGCAGCCT TCCCGCTATA TTGACTCACG GCGTCACTCG TTCGTGCCGG AATAAAATGG 180 

TACGAAAATC GTGTCGGTAA ACATTATCTT TTAACCCAAT AATCATTTAA ATCGCAGCCA 24 0 

GAAAGTTATT CG£TTTTAAC TGAATTATAT TTATAACGGA GAACATTATG GTTTGGCTGG 30 0 

AAATTATCGT AGTACTTGGT GCAATAKTTT TTGGTATTCG CCAGGGGGGA ATCGGTATTG 360 

GTTTATGTGG CGGGCTTGGG CTTGCCATTC TGACTCTGGG ACTTGGTCTG CCTATGGGGG 4 20 

G 421 



(2) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1018 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 
GTTAACAATG GCGTAACAAA TTTCAATAAC GTAGAAGATT TGCTGTCAGA AAGGTCAATA 60 
TTTCCTTTCA ATGGGTCAAA GACTTGCTTC TGGAATTCAT CCGGTTTTTT CTCCAGACGT 120 
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TTTCCTTCTT 


CATAATAG7C 


AATATAACTT 


TTACCACTGA 


GTGTTTTGKC 


YCCATTTCTG 


180 


GTGACACCAG 


CTAACTCA" 


TATCAGCGTA 


TCCCMATGTT 


GCTGGGTAAT 


GAGGACTGAT 


240 


CTTTCAACAG 


AATACTCTTT 


ATTATACTGA 


GATAATATTT 


TAAAGTTATC 


TTGTAAAAAT 


300 


GCAGCATGGC 


GGGCATCATA 


TCCCATTTTC 


AAAGTAATTT 


TTGCCGTGTT 


TTTTGTCCCA 


360 


TTCAGCAATA 


ACATCGGCCA 


TTTTACTGGC 


GACATGTTCA 


AACATTGCCT 


GTTTTGAAGC 


420 


CTCAAGGATG 


CCTGAAATTA 


TCCCCGTAAC 


AGCCCCTACC 


AGCGCGCTTA 


CCGGTGCACC 


480 


AACCAGAGAT 


GTCGTTGCAG 


CAGCACTAAT 


ACCTGAAGAT 


ACTGAAGCCA 


GAACAGTGCT 


540 


TATCGTTGTT 


AACGATGCAT 


CAATAGCTCC 


TGTTTCTTTG 


TGGAAAGCAG 


CAAGTAAACT 


600 


GTCAGCATCG 


TATCCAAGTT 


TTTTGAATCG 


TTGTGAATAC 


TCCTCTATTT 


TATTGGCACG 


660 


TTTAAACTTA 


TCGGCAATGG 


ACAGGAATGA 


GAGGGGACTA 


ATTGCCAGTG 


TCACAACAGA 


7 20 


AGCAATTAAA 


CCGGCAGCAG 


CAGCAGATGT 


AGATAACCCC 


TGTGCTGCAC 


GCTGTGCGAY 


780 


NAATATATTG 


AGAAATACCT 


TTTCCAACAT 


TACCCAGTAC 


TTTCGTTGTT 


AATTCAACAC 


840 


CTGCTGCAGC 


TTTAGTTGGG 


GTATCTGCAT 


CTGCATTGCT 


CAGAATGAAA 


CTTGCTGAAA 


900 


TCGCAGATAA 


AATACCCGAT 


ACAGTATCTA 


ACCCTGCACC 


GATATTATCA 


AGGTTAGGTA 


960 


AATTCTGTAA 


CTTATTACCA 


ACACCGTTCN 


GGNCTGTTGG 


TATTGGGATA 


ATACACTT 


1018 



(2) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 

GGCAATGTTC AAATCGATAT TGTGCAGCAC CTGGGTTGGG CCAAAGTGCT TGGAGACGTT 60 

TTTAAATTCA ATCACAGGAT TTTCATCCTT CTTTCCAGAC GACGCAGAAT AAAGCTCAGC 120 

ACCAGGGTAA T AAT C AG AT A GAACACCGCC ACGGCGCTCC AGATCTCAAG GGCGCGGAAG 18 0 

TTACCGGCAA TAATTTCTTG CCCCTGACGG GTCAGTTCCG CCACGCCGAT CACAATAAAC 24 0 

AGCGAGGTGT CTTTAATGCT GATGATCCAC TGGTTACCCA GCGGCGGCAG CATACGACGC 300 

GTGCCAGCGG TAAAATGACG TAGCGAATGG TTTCCCMACG TGAAAGACCG AGCGCCAGTC 360 

CTGCTTCACG AAAACCTTTG TGGATAGACA GCACCGCACC 4 00 



(2) INFORMATION FOR SEQ ID NO: 95: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1857 base pairs 
( 3) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 



CGTGTTCCCC 


TGGCCNGCTT 


GGTTTCGCCA 


TAGACGTTGA 


GCGGGGAAAT 


CACATCGGTT 


60 


TCCACCCAAG 


GACGTTCACC 


ACTTCCATCG 


AAAACATAGT 


CGGTGGAATA 


ATGTACTAGC 


120 


CACGCACCTA 


ATGCTTCAGC 


TTCTTTGGCA 


ATAACCGCCA 


CACTAGTTGC 


ATTGAGTAAC 


180 


TCGGCAAATT 


CCCGCTCACT 


CTCCGCTTTG 


TCGACTGCAG 


TATGGGCCGC 


TGCGTTAACA 


240 


ATCACATCCG 


GCTTGACGAG 


ACGTACCGTT 


TCAGCCACCC 


CTGCAGAATT 


GCTAAAATCA 


300 


CCGCAATAGT 


CGGTGGAGTC 


AAAATCAACG 


GCAGTGATGT 


GCCCCAGAGG 


CGCCAATGCA 


360 


CGCTGCAGCT 


CCCATCCTAC 


CTGACCATTT 


TTGCCAAACA 


ACAGAATATG 


CATCAGGTAC 


420 


GCTCCCTATA 


GTTTTGTTCA 


ATCCAGGATT 


GGTAGGCACC 


ACTCTTGACG 


TTGTTAATCC 


480 


ATTGTTGATT 


ATCCAGATAC 


CACTGCACGG 


TCTTGCGAAT 


ACCAGACTCA 


AAAGTCTCCT 


540 


CTGGCTGCCA 


ATCCAACGCA 


GCGCTCATCT 


TGCAAGCATC 


AATCGCATAT 


CGGCGATCGT 


600 


GTCCGGGGCG 


ATCCGCCACA 


TAAGTAATTT 


GATCGCGATA 


AGAGCCAGCT 


TTCGGTACCA 


660 


TCTCGTCAAG 


CAGATCACAA 


ATAGTATGTA 


CTACATCCAG 


GTTCTGCTTC 


TCGTTGTGAC 


720 


CGCCTATGTT 


ATAAGTCTCC 


CCGACCAAGC 


CAGTGGTCAC 


TACCTTGTAG 


AGTGCTCGTG 


780 


CATGATCTTC 


CACATACAAC 


CAGTCACGAA 


TTTGGTCACC 


TTTACCATAA 


ACCGGCAGCG 


840 


GCTTGCCATC 


CAGCGCATTG 


AGGATCACTA 


GCGGGATCAG 


CTTCTCGGGA 


AAGTGGTAAG 


900 


GGCCATAGTT 


GTTGGAGCAG 


TTAGTGACAA 


TGGTTGGCAG 


GCCGTACGTA 


CGGTACCAAG 


960 


CACGCACCAG 


ATGATCGCTG 


GAAGCCTTGG 


AGGCAGAATA 


GGGACTGCTA 


GGAGCGTAGG 


1020 


AGGTAGTTTC 


GGTAAAGAGC 


GGCAATGCCT 


CACCGGAGGC 


TACTTCATCC 


GGATGGGGCA 


1080 


GATCGCCATA 


TACTTCATCG 


GTAGAAATAT 


GGTGGAAGCG 


AAAGGCCGCC 


TTGCTCAACT 


1140 


CGCCCAGACT 


GCTCCAATAG 


GCGCGAGCCG 


CTTCCAGCAA 


TGTATAGGTG 


CCTACGATAT 


1200 


TGGTTTCGAT 


AAAGTCGGCT 


GGCCCTGTGA 


TAGAACGATC 


AACATGGCTT 


TCAGCAGCCA 


1260 


GATGCATCAC 


GGCATCTGGC 


TGGTGCAGAG 


CAAACACCCG 


ATCCAACTCA 


GCACGATTAC 


1320 


AGATATCAAC 


TTGTTCAAAC 


GAATAACGCT 


CACTTGACGA 


TACACTGGCC 


AAAGATTCCA 


1380 


AATTGCCAGC 


ATAGGTGAGT 


TTATCCAGAT 


TGATAACGGA 


GTCTCCAGTA 


TCACTAATGA 


1440 


TATGACGCAC 


CACGGCAGAG 


CCGANAAAAC 


CAGCACCGCC 


AGTAACGAGA 


ATCTTCATAT 


1500 


ATTTCGCTCT 


CTTATTTTAC 


AATTAATAGC 


TATTAAAAAT 


AAACTTGTTG 


ACTCCGATAT 


1560 


ATTAGAAATA 


TCGGGATACC 


GAACTAAATA 


TTTTTATATG 


CTTTTGCCAA 


GCAGACTCTA 


1620 


TATCCACCCT 


GTATCACTAT 


GCTTTCTGGC 


ATACAATATC 


CCATCATTGA 


CACAATGATA 


1680 


AACATATAAA 


TAAAGAAAAT 


TTTAAATCAT 


ATAACCAAAT 


TACTTTCATT 


TATTATCAAT 


1740 


AAGTATTTTG 


ATAAGAATAC 


CTATACCACA 


GGGAGCCCCC 


TGAAACATAA 


TATTAGCGAA 


1800 
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GAATGATAAC TGATAGTTAC CATCTTAGAG ATAAAAACTT ATTTGTGTGG CGGGATG 
(2) It: FORMAT I ON FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1123 base pairs 

(B) TYPE: nucleic acid 

(C) 5TRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRI PTI ON : SEQ ID NO: 96: 

AGCTCTTTCG TGTAAAATAA AATACAGCAT ATCCTATATA GCTTACAATC ATTAAATGAA 60 

GTCGCCAATA TTTATATGTT TTATCAATAT CAGCTTGACT CATTGTTATT TCTTTGTCAG 12 0 

GAGACTCTGA AAATATGGAC ATATATAACC TCTTTTATTA TGAAATATTT TCAATAATAA 18 0 

TAATCCGTTA GTAATCCTAT CATAGGGTAA TGTCTCATCA TGTTAAAATG ATCACATTTA 24 0 

TAATCATGTC AAAAAGAACA ACAGAAAAAA TCATATAAAA TCAATTAAAT ATAATTGCCA 300 

CATATTGTTG TTATTWAAAC ATTGGTGGTG AATTTAAAGC GAGAACAGTT TGTAACAGTG 360 

ACTCCTTGCA GACTAAGTTA GAGTCTCCTT CTAAAATTAG ACGGWKTTCT ATTGATGGAT 4 20 

AATAGTAAGC GCACCGTGAA KGACGTGGGG TAAAAATTAG TTTACAGATT GAGTGACATT 4 80 

CCAGGGCAAC AACTCTTTCA CGCGGTTGGC AGGCCAGGTG TTGATTACAC TGATCACGTG 54 0 

GCGTACATTA CCGGACTCGA TTCCGTTAAG TTTGCAGCTA CCGATCAGGC TGTACATCAC 600 

TGCCGCACTC TCGCCTCCAC CATCAGAGCC GAAGAACATG TAGTTACGCC GCCCCAGTGC 6 60 

AATACCCGGA GGCGTTTTCA CACAGGTTAT TGTCGATCTC CACCCAGCCA TTGCGGCAGT 720 

ATTCGTTCAG AGCGTCCCAT TGCTTCAGCA GATAGGTGAA CGCTTTCGCT GTATCCGAGT 780 

GGCGCGACAG TGCTCATCTG CCCCTGGAGC CACTCATACA ACGACTGCAT TAGCGGTACC 840 

GTTCTGGCTT TTCTGACCGC CAGTCGCTCT TCTGCCGGAC TGCCGCGGAT CTCAGCCTCG 900 

ATAGCGTACA GTTCACCGAT ACGCTGCAGG GCTTCCGTGG TGATGTCAGG TGGCGCTCTT 960 

GCATGCACAT CGTGGATTTT TCTCCGGGCA TGGGCCATAC AAGCCGCTTC GGTTACCTGA 102 0 

CCGCTTTCGT AAAGAGCATT GTAACCCGCA TATGCATCGG CCTGCAGGAT ACCTCTGTAG 108 0 

TCCGCCAGAT GTTGCTGTGG GTGGATGCCT TTGCGGTCGG GAGAGTAT 1128 
(2) INFORMATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 439 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 
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GTTTGCTTAC GAACCGTGAA ATATGACGGT CCCATATAAC TGCCTGATAC TTGTATATCA 60 

TATACTTGTG CATGCATGTC ATCATTAAAA AGTACTTTGT CACCGTCTTT AAGTTGAAGA 12 0 

CGTGTAAAAT CTTTATACGG CAAGTAGACG GAAAACGGGC GCTTTCCCTG TCGCCAATCA 180 

CACCGACATG ACTGACTTTT GCGAGAGGAA GTGCATAATT CACCAATTCA GAGCCTAATG 24 0 

CATTGCGCTG GGTAAGCTCA AATCGGAATG GGTTTCGAAC CTTTCCCGCA ACATTGATCA 30 0 

TTGGACCTTG TTGCTCAACT GAAAATCACA TCTTGATCTT TTAATGCCAG CTTCGGGAGT 3 60 

TTCCCATACC GTATGAAATC ATAAAGATCA ATTTGCKGTG NTTACTGCTA TTTTGTGCGT 420 

GAACACCTTA ATTTTTGCG 4 39 



(2) INFORMATION FOR SEQ ID NO: 98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 906 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
{ D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 

TATTCGTAAT TAGTTATAAA CAGATGATGT AAACACCAGT TGACTAGAGT CAATCTTATA 60 

CTGGCAACAT CTATGATTAA TTTGTGTGGT TATAATTTTA AATATCTTAT ATTTATGGGC 12 0 

TATTATTGAT ATCTGTCAGA GTATCAATAA TAGAAGGTAA TTGTTTTACA TACTATCAAC 180 

TTTTGGATAA CGTTTTAAAA TGCACCTTGC ACATCGTATT TTATTATTTT CACTAATCTT 24 0 

TTTTATAACG GCCTGCGCAC ATGATCCAAA ACAAGTTGAA GCCTCTCGTC CATTGGTAAC 300 

AGCGATTAAT TCTTCTTATT CTCTTATTCC TGAAGATTTG CAGGCACCAT TAAATAACCA 360 

AGATCAAGGC ACGACATTCA ACAAAAATGG CGTAATTTAT ACTATTGAGG AAAGGTATAT 4 20 

ATCGGCTTTA GGTTCTCAAT GCATAAAGTT AAGTTATGCG ATGAATAAAA ATTATTCAAA 4 80 

GCGAAGTGTT GTATGTAAAG AGAATAACAA GTGGTATCAA GTACCTCAGT TGGAACAAAC 54 0 

ATCAGTTAGC ACTTTGCTTA TTGAAGAATA AAGTTGAAGG TAGACGGTTA GAAAATAATG 600 

AAAATTTCGC AACTTAGCAC TCTTCTCTTT CTTATTTCTG CATCAGCATT CGCCGCAATA 660 

GAGCAAAATC AATCTAATGG TTCACATTTA GATTATGATC TTGCTGCCTC GACAGGAGAG 720 

TCTCGGAAAA TGCTAGCAGA CATCACTGGA CAGCCTAATA CAACCTCCAC AACAGGAAGC 780 

TTCACACAAC AGAATCGTAA TGGGATGTTG CTTCCAGGAG AGTCAGATGT ACGAAAAT T A 84 0 

CTGCCGCAAT CTGAAGCAGG CTTACCTCCT CCGTATGGTG CTAATTTATT TGCCGGAGGC 900 

TATGAA 90 6 
(2) INFORMATION FOR SEQ ID NO: 99: 
(i) SEQUENCE CHARACTERISTICS: 
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{ A) LENGTH; 1396 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

( D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 

GCGGCCTGAT ATATGCCGTT ATTACAAAAA GAGGATCAAC CACACTGCCT TTTGGACCGT 60 

GTTTAAGTCT GGGCGGTATA GCAACACTTT ATCTACAGGC ATTGTTTTAA TGATAACCAC 120 

GTCATTATCA AAGTGACATT TTAACTCTTA TTAATAACCT TAGAGATTAT TTACCATGTC 180 

GATAAAACAA ATGCCAGGGA GGGTATTAAT ATCGCTATTG TTGAGCGTTA CAGGATTATT 24 0 

AAGTGGCTGT GCCAGCCATA ATGAAAATGC CAGTTTACTG GCG AAAAAAC AGGCGCAAAA 300 

TATCAGCCAA AACCTGCCGA TTAAATCTGC GGGATATACC TTAGTGCTGG CGCAAAGTAG 3 60 

TGGCACGACG GTAAAAATGA CCATTATCAG CGAATCGGGT ACTCAGACCA CGCAGACACC 4 20 

TGACGCCTTT TTAACCAGCT ATCAACGACA AATGTGCGCT GACCCAACGG TGAAATTAAT 4 80 

GATCACCGAG GGAATTAATT ACAGCATAAC GATTAATGAT ACACGTACAG GTAACCAGTA 54 0 

TCAGCGGAAA CTGGATCGTA CCACCTGTGG AATAGTCAAA GCATAACGTC GGGTAGATAT 600 

AAATTGGCGC GGGTTGTTTT TCGTGACGCA CGAATTTATC TCATTCAATG GCTGACAAAA 660 

ATTCGTCACA CTCTTAACCA GAGACAATCT CTTAATACAG ACAAAGAGCA TCTGCGCAAA 7 20 

ATTGCACGCG GGATGTTCTG GCTGATGCTG CTTATTATTT CTGCAAAAGT GGCGCATTCA 780 

CTCTGGCGCT ATTTCTCCTT TTCTGCGGAA TATACGGCGG TTTCCCCATC GGCGAATAAA 84 0 

CCGCTCCGTG CGRATGCAAA AGCGTTCGAT AAAAATGACG TGCAATTAAT CAGCCAGCAA 900 

AACTGGTTTG GCAAATATCA GCCCGTCGCC ACGCCGGTAA AACAACCCGA ACCTGCACCT 9 60 

GTGGCCGAAA CGCGTCTTRR TGTGGTGTTG CGTGGGATCG CCTTTGGTGC CAGACCCGGC 1020 

GCGGTTATTG AAGAAGGTGG TAAACAGCAG GTCTATTTGC AGGGTGAACG CTTGGCTCGC 1080 

ACAACGCAGT GATTGAGGAA ATCAACCGCG ACCATGTGAT NTGCGCTATC AGGGAAAAAT 1140 

AGAGCGCCTG AGCCTGGCTG AAGAGGAGCG TTCCACCGTT GCCGCGACCA ACAAAAAAGC 1200 

TGTCAGTGAC GAAGCAAAGC AAGCTGTTGC TGAACCTGCT GTCAGTGCGC CAGTTGAGAT 12 60 

CCCNGCTGCC GTGCGTCAGG CACTGGCGAA AGATCCGCAG AAAATTTTTA ACTATATCCA 132 0 

GCTTACGCCT GTGCGTAAGG AAGGGATTGT CGGTTATGCA GTGAAACCGG GGGCAGATCG 138 0 

TTCTCTGTTC GATGC 13 95 
(2) INFORMATION FOR SEQ ID NO: 100: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 380 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
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(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: 

CACTTGAATA AAACTGACAC CGTTTACCTC CATAATAGTG AGCATAGCCG CCATTGCGGC 60 

CTGATCGGCG AACCGGAAAT CGCAACCTGC GAACGACAAC CGAACCGGCA AGCGTGCGGG 120 

AAGGACGGAT ACCGGACTCT TTCGCCACTT CAGCAATCAC CGGCAGCGTG GAAAAAACAA 180 

TAAACCCAGT ACCGGCCATA ATGGTCATAG ACCAGGTGAT AATCGGCGCG ATTATGTTGA 240 

TATATTTCGG GTTACGCCGC ATAAAATTAC CAGCGACGGT ACCAGATAAT CCATTCCCCT 300 

GCGGCCTGTA AGGCTGAGGC CGCCACAACA ACGGTCATAA TAATCAGGAT CACGTCGACT 3 60 

GGCGGCGACC CCATAGGCAG 380 
(2) INFORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 995 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 



CTTTACGGTT 


TAATAGGGGA 


ANGCCGACTG 


GATGNAAAAA 


TGGAATCTGG 


AGCCCAGAAT 


60 


AAATCTGAAT 


TTAATGTGGA 


CTGGATATGC 


TCCAATAACC 


CCGGCAGGGA 


GTCATCTGTG 


120 


CGAAGATATT 


TGCGTTATGC 


TGTAATATAA 


TAATTCAATG 


TATTTCAGGA 


ACAGTAATAT 


180 


ACTACAGTTT 


CTACTTTCTT 


GTATTTAATA 


AATTGTTCCG 


CATCGCTAAA 


AGCAGGTCTT 


240 


TCAGAAGCCA 


CAAGAATTCT 


GTGGTCCCAG 


TATTTTTAGT 


TATCCTATTT 


TTATATCTAA 


300 


CTTGTAATAC 


TTACAGCATT 


TTCATTCATC 


CTAATGGAAG 


GCTGTAATAA 


TCTTTGAGCT 


360 


T AGAAAC AT C 


AAAATTATGC 


ATCTCATTAA 


TTTTGTCAGT 


CACACGACCT 


CTGGTAAAAA 


420 


TAAAACCCCC 


AGAAATATGC 


CATTTCTAGG 


GGGGGCGTAA 


GAATCAATAT 


ATTTTAGTGT 


480 


TGTTACATTT 


AGCTCTTAGC 


TCTTAGCTCT 


TAGCTCTTAG 


CTCTTAGCTC 


TTAGCGTTTG 


540 


TAGTTTCATC 


GCAATGAGTA 


AAAGGACAAC 


AAGAATAAGT 


GATAACGTTA 


AGAGAAGAGC 


600 


ATAGAAACCA 


TTCCAGTGGT 


ATATTTCTAT 


TATTTTAGAC 


AATGGATAGC 


CAGCCGCGGA 


660 


CGCACCAAGA 


TATGCGAATA 


AACTAACAAA 


ACCAGTAGAA 


GCACCAGATG 


CATATTTATG 


720 


TGAGTTTTCA 


GCAGCTGCCA 


TTGCGATCAG 


AAATTGTGGC 


CCAAAGATAA 


AGAAGCCAGT 


780 


GATGAAAAAT 


AATAACGAAA 


AAACATATTT 


ACTATCAATA 


GAAACCAACC 


ATAGACATGC 


840 


AGAAGCAATG 


ATTATACCAA 


TTGTATAAAT 


AACATTCATT 


TGAGAGCGAT 


TGCCCTTAAA 


900 


CAGAATATCT 


GATCCCCATC 


CAGCTACGAT 


AGCACCAAAA 


AAGCCTCCAA 


CCTCAAACAT 


960 
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CATTACTGTT CCATTTGCTG TTAGCAAGTC AT ATT 995 
(Z) INFORMATION FOR 5EQ ID NO: 102: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 817 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 

TAAAAGCGAC TCCATGTGAA ATTTCTGTTT GTCGTTTTTT CCCCGTTGTA GCGGCTCTGC 60 

TCCTGGCTTC CCTGATAGTC AGCCCGCAGG CGCCAGGGCC CCAGATTCCC CCCCACAGTC 120 

CCGTTATAAC TGAACTGATG AGAGTCTCCT CCCTGATAAT TACGGGAAAC CGTCCCGTTG 180 

AGGTTATAAT CCAGCATCAG TCCGGGAATG CCGTCGTCCC AGCGTGAGGG AGGCAGCCAG 24 0 

GTGGCATCAG AATACTCAAG CCAGGCCTGC GGCATATTGA TGCGTAATAC GCCCGCTCCG 300 

GTATCAGGAC GAATATCCAC TCCCGGCAAC CCATGAAAAT CCGCACACTG ACCATCATGC 360 

CAGTAAACAA CTTTATCCAG AGATTCTGCT GTTAACCCCA TCAGTCTGAC CATATCTGAT 4 20 

GTCAGACAGC TGCGGCAATT TTTTTTCTGC CTTATCTCCT GACAACGCAG GTTCAACAAA 4 80 

TGAMATCTGT AACGATGCGG GAGAAATACT TTGCCCGTTA ACAATCACAT CCAGAAGATA 54 0 

TTGCCCCGGC AGAACATAGC CGGCTTCTGA AAAACGGGTG AAGTCAATAT TTTTCTTGTC 600 

CGCTGCGTCA AGTACATCTG TATTAAACTC AACGGCACTG GCTGCGTTAC AAAACAGAGA 660 

CAACAATATC ACACAGGTAA TATTGTTGAC TGCAAAAGGT ATTCTGTCTT TCATTCCACG 7 20 

CATCACCAGA TTCACAAAAA AGATAAATAA CCGGACATCT CACCGGAGTG ACTCACTCAT 7 80 

AATCGACCCG GAATCCCAGC ACAGCAAAAT AATTTCC 817 



(2) INFORMATION FOR SEQ ID NO: 103: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 709 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

( D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 

TTTTTGTCAG AGCGTTCACT CTCTGGCTGG ATGATTTCGG CTCGGGAAAT GCAGGCTTAA 60 

TGTGGGGACT GTCGGGGATG TTTGAACGGG TAAAAATAAG TCATGAGTTT TTTCATTATG 120 

TCCTGAAAAA CGGGTGTGCA ATGCCACTTC TCCGTGCTGT GGCAGACACT GTTGCCTGTC 180 

ACAACAGAGG CGTGATACTC GAAGGTGTTG AAAATGAAGC GTTGTTCCGT ATTGCCAGAG 24 0 

ACATGAATGT CCAGGGCTGT CAGGGATGGC TCTACAGGCG TGTGGGGGTT GATGAATTAT 300 
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CCGCGCTTAT TCAGCAGTAT GAATAATCCT TTTTCACAGA CTGGTCAGCT GTCAACATTT 3 60 

ATGTTTTTTT ATCTGCGGGA ATTTATCCGT CTGCCTGTCG GGACTACTCT GTCATACAGA 4 20 

AATCAGGCCA GAATAAATTG TTGTGGAAAG GTGAGATTTA CCGGATGACT GATGTGCTCT 4 80 

TGTGCACAGG TATACAGGCA GTGTGTTTCC AGTATATGGA AAATGATTAA ATGAATAACA 54 0 

CAGACTTATT AGAAAAAATG ATCAGGCATC AACAAAACAA AGATCCTGCA TATCCTTTCC 600 

GGGAACATCT TTTGATGCAA CTCTGTATCC GTGTAAACAA AAAAATACAG AACAGTACAT 6 60 

CTGAGTTTTT TGGTGCATAT GGTATAAATG ACTCAGTATA TATGGTTCT 70 9 



(2) INFORMATION FOR SEQ ID NO: 104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 485 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 

TCATCAAGGG ACGGGGCATA TCTGGATGCG ACAGGGCAAA CCAACCACTG AGAATCCAAC 60 

CTGCCAAAGC CTGACCAGGA AGTCCGACGT TAAAGAAACC AGCTCGACTG GCAACGGCAA 120 

AACCAAGACC AATCAAGACC AGAGGACCCA TAGCACGGAA GATTTCTCCA ATCCCACGCA 180 

GACTGCCAAA GGCTGTATAG AACAATTCTT CGTAGCCCCA AATAGCATCA TAACCGAAGA 24 0 

TCCACATGAC AATGGCTCCG AGTAAAATTC CTAGGAATAC AGAAATCAAG GGAACCGAAA 300 

TTTGTTGTAA TTTTTTAGAC ATCACTCTTC TCCTTTCCCA AGTTYCCACC AGCCATCAAG 3 60 

ACACCAAGTT CTTGTTTATT GGTTGTTTCT GGTGATACAA TACCTTGAAT CTTACCATCG 4 20 

TGGATAACGG CAATACGGTC TGAGACGTTT AAAATCTCAT CCAATTCAAA GCTGACNACA 4 80 

AGGAC 4 8 5 



(2) INFORMATION FOR SEQ ID NO: 105: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 459 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105: 

AGCAGAATAG GCAACATCAC CACGCCGACA AACAGCGAGA AGAGAATGAC GCCAGCCGCC 60 

AGGAACACCA GCTCATAGCG CGCCGGGAAG ACGTTACCAT CCGGCAAGAG CAGCGGGATA 120 

GAGAGCACAC CGGCCAGAGT GATCGCCCCA CGCACCCCGG CGAAAGACGC GATCAGGATT 180 

TCTCGTGTGG TCCACGAACC AAACTCCATC GGCTTCTTCT TCAGGAAGCG GTTGCTGAAC 24 0 
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TTTTTCATCG TCCACAGCC. 



■ A GCCGAAACGG ACT C AG CATC A GCGCCGCATA TATCAGAATA 



300 



ATATTGGTAA AC AG CATC CA GATTTCGACG T TAG GGTCG A TTTCTTGCTG GCCATCAGCG 



360 



GACGTCTTCC AGRATTACCC GGCAGCTGCA GACCTTAACA GCAGGGAACA CCATGGCCGT 



420 



TTTAAGGACA ATTTCNAGCA TCGGCCCANG TGCTGTTTT 



459 



(2) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 908 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 

TTAATAGCAC TAATACTGTC CTGCTCTATT CCGCTGACAT TTTCAGTCAG CTGCTGTATG 60 

GGATGGGTTA CCCAAAACCA GACCAGCATA CCTGACAAGA GACCGCATAT CACTACCAGA 120 

AACAGCGACC AGTACAGTGC ATTCCATAGT GCCTTTGTCC AGGCTGTATC AGTAAGAGCA 180 

TTAAGTTCCT CTCCCTGTAA AATAATATAC AGATATCCTT TCGGTTCATC ACTCTGGTAA 24 0 

AGCGGTGCGG TACTGAAAAC TTTTTGCTTA TTTACACTTC GGGGATCATC ACCATATACG 30 0 

GGCCAGACAC TGCCGGAGAG AAATTTTTTC AACGGTGCAA TATTGATATA CCGGCGTTTG 360 

AGATGACCCG GAGGGCGGCC TCCACAAGCA GTCGCCCTTC CGGTGAAACC ATATACAGCT 4 20 

CCACACTGGG ATTAAGCGTC ATCAGACGCT CAAACAGACT CGTTAATGTC CGGTGTTACC 4 80 

AGACAAAACA AGCATCGCAA GACGCCACAA ACGGTGCGCT TACTTAAATA AGCCGGTTAC 54 0 

AGGTGAAAAA TCACGTCCTG ATATTCAAAT GTTTTTTCAG GTCATATTTT AGCAGGACAC 600 

TACCAGCACC TAACAGCAGC ACATCTTTTA TAACAAAACT GTCAACTTTC CCCAGTTGTG 660 

GTAACAGGCT GAGCGTGGTT ATTCCTGTAA CAATAACGAT AATATCTCCC AGTACACCAG 720 

CAGCAGGCCT GAAGAAACCG ATAATCAATG CCAGAAATGT GATAGTTTCC ACTATGCCGA 78 0 

GGAAATAGCT CCCTCCATGA ATACCAAATA TAATATACAG GATATTCAGC CAGGTGGGAT 84 0 

ATATCAGGGG CTTGAGAGCC ATAACTTCAA AATCAAACCA TTTATAAGTC CCAAAAAGCA 90 0 

TAAATATT 90 8 
(2) INFORMATION FOR SEQ ID NO: 107: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1057 base pairs 

( B ) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 
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CGGGCTAACC 


CAATATGCTT 


TATTAACCCG 


GGATAATTAC 


CCTGTTGCAT 


ATTGTAGTTG 


60 


GGCTAATTTA 


AGTTTAGAAA 


ATGAAATNAA 


ATATCTTAAT 


GATGTTACTT 


CATTAGTCGC 


120 


AGAAGACTGG 


ACTTCTGGTG 


ATCGTAAATG 


GTTCATTGAC 


TGGATTGCTC 


CTTTCGGGGA 


1U0 


TAACGGTGCC 


CTGTACAAAT 


ATATGCGAAA 


AAAATTCCCT 


GATGAACTAT 


TCAGAGCCAT 


24 0 


CAGGGTGGAT 


CCCAAAACTC 


ATGTTGGTAA 


AGTATCAGAA 


TTTCACGGAG 


GTAAAATTGA 


3 00 


TAAACAGTTA 


GCGAATAAAA 


TTTTTAAACA 


ATATCACCAC 


GAGTTAATAA 


CTGAAGTAAA 


360 


AAACAAGTCA 


GATTTGAATT 


TTTCATTAAC 


AGGTTAAGAG 


GTAATTAAAT 


GCCAACAATA 


420 


ACCGCTGCAC 


AAATTAAAAG 


CACACTGCAG 


TCTGCAAAGC 


AATCCGCTGC 


AAATAAATTG 


480 


CACTCAGCAG 


GACAAAGCAC 


GAAAGATGCA 


T T AAAAAAAG 


CAGCAGAGCA 


AACCCGCAAT 


540 


GCGGAAAACA 


GACTCATTTT 


ACTTATCCCT 


AAAGATTATA 


AAGGGCAGGG 


TTCAAGCCTT 


600 


AATGACCTTG 


TCAGGACGGC 


AGATGAACTG 


GGAATTGAAG 


TCCAGTATGA 


TGAAAAGAAT 


660 


GGCACGGCAA 


TTACTAAACA 


GGTATTCGGC 


AC AG C AG AG A 


AACTCATTGG 


CCTCACCGAA 


720 


CGGGGAGTGA 


CTATCTTTGC 


ACCACAATTA 


GACAAATTAC 


TGCAAAAG I A 


1 CAAAAAGCG 


"7 O A 

/ 0 u 


GGTAATAAAT 


TAGGCGGCAG 


TGCTGAAAAT 


ATAGGTGATA 


ACTTAGGAAA 


GGCAGGCAGT 


840 


GTACTGTCAA 


CGTTTCAAAA 


TTTTCTGGGT 


ACTGCACTTT 


CCTCAATGAA 


AATAGACGAA 


900 


CTGATAAAGA 


AACAAAAATC 


TGGTGGCAAT 


GTCAGTTCTT 


CTGAACTGGG 


CAAAAGCGAG 


960 


TATTGAGCTA 


ATCAACCAAC 


TCGTGGGACA 


CAGCTGGCCA 


GCCTTTAATA 


ATAATGTTNA 


1020 


ACTCATTTTC 


TCAACAACTC 


AATAAGCTGG 


GGAAGTG 






1057 



(2) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 752 base pairs 

( B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: 

TACCGGGCCC CCCCTCGAGG TCGACGGTAT CGATAAGCTT GATATCGAAT TCCTGCAGCC 60 

CGGGGGATCC ACTAGTTCTA GAGCGGCCGC CACCGCGGTG GAGCTCCAGC TTTTGTTCCC 120 

TTTAGTGAGG GTTAATTTCG AGCTTGGCGT AATCATGGTC ATAGCTGTTT CCTGTGTGAA 180 

ATTGTTATCC GCTCACAATT CCACACAACA TACGAGCCGG AAGCATAAAG TGTAAAGCCT 24 0 

GGGGTGCCTA ATGAGTGAGC TAACTCACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC 300 

AGTCGGGAAA CCTGTCGTGC CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAGAGGCG 3 60 

GTTTGCGTAT TGGGCGCTCT TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC 420 

GGCTGCGGCG AGCGGTATCA GCTCACTCAA AGGCGGTAAT ACGGTTATCC ACAGAATCAG 4 80 



WO 98/22575 



PCT/US97/21347 



GGGATAACGC AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA AAAG G C C AG G AACCGTAAAA 



540 



AGGCCGCGTT GCTGGCGTTT TTCCATAGGC TCCGCCCCCT GACGAGCATC ACAAAAATCG 



600 



ACGCTCAAGT CAGAGGTGGC GAAACCCGAC AGGACTATAA AGATACCAGG CGTTTCCGCG 



660 



TGGAAGCTCC CTCGTGCGCT CTCCTGTTTC CGACCCTGCC GCTTTACCGG ATANCTGTNC 



720 



GGCTTTCTCC CTTCGGGAAG CGTGGCGCTT TC 



752 



(2) INFORMATION FOR SEQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 486 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : double 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 

CTTGGGTAAT NGACCTCATA TCCCTCCGCC AAAAAAGGAT CTACATGCGA TTTTGCGAAG 60 

CCAGCGTTGA TTGTAGGCGA GAGAATGGTT CTGTTGTTTT GGTACATTTC AGTTGTCATG 12 0 

GATTTCACAA ATGTAGCATG ACCTTTCACC TGTCCAAGAG ACTGCAACAC CATCTGTCCA 180 

AAAC AAT AAA TAGGAATCAA ACAGGCTACC AACATCAACA AGTATCCCAA TAAGGCTCGT 24 0 

AGTTTAGTCC TTGACATGAC GCCCCTCCAA TTGCTTTTCT AGTCCTTTGA CAATCCGTCG 300 

ATTACGATAC ACGCGATACA GCAAGAGAAG GATGACCGCC ATCGCTCCTA GTAATAACCA 3 60 

CAACCAGAAT TGCCCACGCT CTCTCACCGC TCGATTCCGC TCTGCAATTG GTGCCGTATA 4 20 

CGGAATCCGC TTCCCACGTA CCAACAGACG ATGACTGTTA ATCCTATACG GTGTACNAGT 4 80 

CAACCA 4 86 
(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 313 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 

TTACGCNTTC AACCAGGTCT TCTGGTTTAC CAACGCCCAT CAGGTAACGC GGTTTGTCTG 60 

CCGGAATTTG CGGGCATACA TGCTCCAGAA TGCGGTGCAT ATCTGCTTTC GGCTCACCCA 120 

CAGCCAGACC GCCGACAGCG TACCATCAAA ACCGATATCT ACCAGACCTT TAACAGAAAT 180 

ATCACGTAAA TCTTCGTAAA CGCTGCCCTG GATGATACCA AACAGCGCAT TTTTGTTTCC 24 0 

GAGACTGTCA AAACGCTCAC GGCTACGTCG CCCAACGCAG AGACATCTCC ATGGAGCGTT 300 

TTGCGTAATC CCA 313 
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(2) INFORMATION F*OR SEQ ID NO: 111: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1613 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : double 
( D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 



CGGAAATCCC 


AGTAATTCCA 


TCCTCANATA 


TTCCACTCAN 


CCTCACTGTA 


ACAAAGTTTC 


60 


TTCGAATAAT 


AAAAATCATG 


CTTTCTGTTA 


TCAACGGAAA 


GGTATTTTTA 


TTCTCTGTGT 


120 


TTGCTTTATT 


TGTGAAATTT 


AGTGAATTTG 


CTTTTTGTTG 


GCTTTATNTG 


ATGTGTGTCA 


180 


CATTTTGTGT 


GTTATTTTTC 


TGTGAAAAGA 


AAGTCCGTAA 


AAATGCATTT 


AGACGATCTT 


240 


TTATGCTGTA 


AATTCAATTC 


ACCATGATGT 


TTTTATCTGA 


GTGCATTCTT 


TTTGTTGGTG 


300 


TTTTATTCTA 


GTTTGATTTT 


GTTTTGTGGG 


TTAAAAGATC 


GTTTAAATCA 


ATATTTACAA 


360 


CATAAAAMMC 


TAAATTTAAC 


TTATTGCGTG 


AAGAGTATTT 


CCGGGCCGGA 


AGCATATATC 


420 


CAGGGGCCCG 


ACAGAAGGGG 


GAAACATGGC 


GCATCATGAA 


GTCATCAGTC 


GGTCAGGAAA 


480 


TGCGTTTTTG 


CTGAATATAC 


GCGAGAGCGT 


ACTGTTGCCC 


GGCTCTATGT 


CTGAAATGCA 


540 


TTTTTTTTTA 


CTGATAGGTA 


TTTCTTCTAT 


TCACAGTGAC 


AGGGTCATTC 


TGGCTATGAA 


600 


GGACTATCTG 


GTAGGTGGGC 


ATCCCGTAAG 


GAGGTCTGCG 


AGAAATACCA 


GATGAATAAT 


660 


GGGTATTTCA 


GTACAACACT 


GGGGAGACTT 


ATACGGCTGA 


ATGCTCTTGC 


AGCAAGGCTT 


720 


GCACCTTATT 


ATACAGATGA 


GTCGTCGGCA 


TTTGACTAAA 


TTATGGCATT 


CCGGAGTTTC 


780 


TGGAAGATAA 


AAAAAGAAGC 


CCTTATCAGA 


AAGCAGACAG 


GTTATATCAG 


TATTCTGTCG 


840 








ATTATTTCTA 


TTGATCTGGT 


TATTAAAGGT 


900 


AATCGGGTCA 


TTTTAAATTG 


CCAGATATCT 


CTGGTGTGTT 


CAGTAATGAA 


AAAGAGGTTG 


960 


TTATTTATGA 


TTAAGTCGGT 


TATTGCCGGT 


GCGGTRCTAT 


GGCAGTGGTG 


TCTTTTGGTG 


1020 


TAAATGCTGC 


TCCAACTATT 


CCACAGGGGC 


AGGGTAAAGT 


AACTTTTAAC 


GGAACTGTTG 


1080 


TTGATGCTCC 


ATGCAGCATT 


TCTCAGAAAT 


CAGCTGATCA 


GTCTATTGAT 


TTTGGACAGC 


1140 


TTTCAAAAAG 


CTTCCTTGAG 


GCAGGAGGTG 


TATCCAAACC 


AATGGACTTA 


GATATTGAAT 


1200 


TGGTTAATTG 


TGATATTACT 


GCCTTTAAAG 


GTGGTAATGG 


CGCC AAAAAA 


GGGACTGTTA 


1260 


AGCTGGCTTT 


TACTGGCCCG 


ATAGTTAATG 


GACATTCTGA 


TGAGCTAGAT 


ACAAATGGTG 


1320 


GTACGGGCAC 


AGCTATCGTA 


GTTCAGGGGG 


CAGGTAAAAA 


CGTTGTCTTC 


GATGGCTCCG 


1380 


AAGTGATGCT 


AATACCCTGA 


AAGATGGTGA 


AAACGTGCTG 


CATTATACTG 


CTGTTGTTAA 


1440 


GAAGTCGTCA 


GCCGTTGGTG 


CCGCTGTTAC 


TGAAGGTGCC 


TTCTCAGCAG 


TTGCGAATTT 


1500 


CAACCTGACT 


TATCAGTAAT 


ACTGATAATC 


CGGTCGGTAA 


ACAGCGGAAA 


TATTCCGCTG 


1560 
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TTTATTTCTC ACGGTATTTA TCATGAGACT GGGATTGTCT GTTGCACTTT TCT 1613 

(2) INFORMATION FOR SEQ ID NO: 112: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 930 base pairs 
(3) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 

NTAGTCCATG GCCCCATGGA GCGAANTCCA AAGTGTGGAT ATTGTCGTTT TAATTCATCC 60 

CAAAAGCTGA AATACGCCAA AACCCACGTT CCCTAACATT GGTATCATGC ATAATGACCA 120 

CAGCCNTTCA GAAAGCTTTG GCAACCAGCT TTCAAAATCA TGGGTACCGC TTCAAACGTA 18 0 

TGCAAACCAT CAATATGAAG CAGATCAATG CTACCTTGTG AAAAATGCTC TAACGCTTGG 24 0 

TCAAATGTAC TGCGAATGAG AGTAGAAAAA CCTGAATAGT GCTGTTGATT ATATTCTGAT 300 

ACTTGCCTGT AAACTTCTTC GCCATACAGC CCCGCATGTT CATCTCCCCC CCAGGTATCA 360 

ACGGCAAAGC AGCATGTTTC TAAATCTAGT TTAGAGACTG CTTGGCAAAA TGAGAAATAA 4 20 

GAACTTCCAT AATGAGTTCC CAGCTCAACA ATATTTCTTG GCCGCAGTGT GTCAACTAAC 4 80 

CAGAAAGCAA AAGGAATGTG TTCTAGCCAA GCAGATTGTG CAAGGTATGT AGGACACCAN 54 0 

AAAAG AGAT G GTTTGAAAAT GAAATTCAAT TCCCTGCCAA TATCAGTGAT GGGATATAAC 600 

TCACGATTCT CTACTAACTG ACTAATTTTT TGACTATCCA TTGAGGAAAA CTCACATGTA 660 

TTTATAGAAT TAAATCAAGA AACCTGAAAA TACCTATAGT GCGGTAACTT ATTAACTAAC 720 

ATTTAAATAT TAACAATACA CTTGGAAATA TTAGTTAAAA ATAAATCATT ATGATTTCTC 780 

ATCAATCCTG GTGCTCACGC AAAGTTGCCA GCCCCATAAT AATAAGACCA TAGAACAAGC 84 0 

AAAGTAATAC ACCCACAGTC GCAAGATTAT AGAATCGCCG TGGATATTCG GCATCTTCCG 900 

CTAAAGTTGG TTGGGTAATA ACCAATAGAT 930 
(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 659 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 

ACGATATCCC CCCTCTGCTT TTGAGAGGCA ATCTGCTTTA ATACATGATT CATCACAACA 60 

CCTCTTGCTG CGCTTTGATC TTAATTTTAT ATTTTTGGGT AGGGAAAAGT AATTGCCCCT 120 

GATACGGCTC ACCATTTACC AACGTTTCAC AGCTATGTTC CAGAGCTAAA TTAAGACCTG 180 
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GTAGAATATC CCAGCAATTC ACCCCTTTGA CATTTTCAAA GCTGTCATAA GCACCGGNNA 24 0 

AGGGGGGGCC AACATGTTAT ACATGGAGCA GCCAATGATA CGATATTCAA AGCCCTCTTC 300 

CAGTTGCATC AGATCCTGCT TGGTAASGGA GGAAGAGAGG CCACGAATAC GAGAGCGATG 3 60 

ATGTGTAATC GGCATACCTG TGATATGAAG ATCATTCAAT TCAGGTAAGA AGATGCAGGA 4 20 

CTCTTGATGT TTCCCCTCGG TGTAAATGCT GATACCAATG CCCCACTCTT TGAGCCCAGA 4 80 

GACAAAGTTT TCTGTGCCAT CAATTGGATC TAGAACAATG TAAGAACCTT TGGGATTCCA 54 0 

CTCAATATCT CCTAAAGGGG CTAATTCCTC TGAAATTAGC ACATGCCCTG GTAGATGCTT 600 

TCTACAGAGT TCGAAAACTA TATCTTGAAC TTTTAGATCC AGTACTGCGG CCGCGATCC 659 



(2) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 556 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 

CCCGGATATA CATCAGGAGA AATTGGAGCA GCAATTGGAT GCGCCATTAA TGCCTGGTTA 60 

GGGATCCCCG CATGTGGGCA CGCAAATGGC TCAGAATATG ATCGACCTTC ACCAGATAAA 120 

CCAAATCTGA GCGAACCATT TATCCCAAGA CCCACGTATG ACGCTTCACT TCATTCCTGG 18 0 

CATGGCGGAT ACTGAGTAAA TCATCCTGAA TCATTATGTT CAACATCATC AATTCTCCGG 24 0 

ACTTGTTGTC AGATGTCCGG AGAATATTAA CCTTTTCTTC AGAAACAGAW TGATCAAGAA 300 

TCACACTCCT TCTTTAAGAG GATTTTATCC AGAAAACTGA CTTTCTTCTA TCAAAATMAC 360 

AGTATCCTGT TTTATCAGGA ATAATCTTTA CCTCCGGTAT CATTCCCATA ATCAGATATC 4 20 

AGAAAAATGT GCCAGTAATT TTTTACTGAT GACTTCAAAC ATTTCACATT CATCACACGT 4 80 

CAGATTACTC CAAAGTTCTT TCAGATATGT GTTCTGCGCC AGAGTGAGTC TCTGAATAAA 54 0 

AAACATACCT TCAGAC 556 
(2) INFORMATION FOR SEQ I D NO : 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 503 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRI PTION : SEQ ID NO: 115: 
TACCTGTTTG TGGAATTTGA CCCAGAAGTG ATTCATACCA CGACTATCAA CGCGACCCGN 60 
GTGTNCAGCC ACTTCGTGCG CTTTGGCGTN CGCAGCGATA GTCCCATCGG CGGTTATTCA 120 
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TCAGCTATCG G TAT AT AAA C CGAAAGACAT TGTCGATTCC GGCAACCCCT TATCCGGGTG 180 

ATAAGGTGAT TATTACCGAA GCGCGTTCGA AGGCTTTCAG GCCATTTTCA CCGAACCCGA 24 0 

TGGTGAGGCT CGCTCCATGC TATTGCTTAA TCTTATTAAT AAAGAGATTA AGCACAGTGT 300 

GAAGAATACC GAGTTCCGCA AACTCTAAAA CGCAATCCCA AACAGTGTTT TGACATTAGC 3 60 

ATCCGTGGTG GCAGCCAGCC ATGCGGCATC TTCTCCACGC CAGTGCGCAA TACGTTGCAA 4 20 

AATATGGGGC AGATGGGCTG GCTCGTTGGG CGGGGATGAN GGCTTTGGCG TGAGATCGCG 4 80 

AGGGAGCAGA TACGGNGCAT CAG 503 



(2) INFORMATION FOR SEQ I D NO : 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 433 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 
TTTAACATCA AAATTACCTG CAGCTGAAAT GATTTTGCTG ATTTCATTAA TTAATGGATT 60 

AAGATTACCC TGACTTCCAT AGGCTAATGC ATCATTCCCA TACACATAAC TTGCCTTATT 120 

ATTACTCTGT TGATACTNAA GTGCCTTTTT AAGGGAATCT GGTGTGATTA CCCTGCCGTC 18 0 

TTTATCAAAA ATCTGCTCTA TCTGGTGATT AGAGATATCA CCTGACTCTT TTTCAAACCA 24 0 

GTTTTTAAAT GTAATACCAT TTTTGTGGCC AATGGAAAGA ACATTACCTT CAGCTTTATA 300 

CATGATGAGG TCATTACCTT CTCGCCTGAA GGCCACATCC CGGAAATCAA TATCAGCCAA 3 60 

ACTGAGTTTA TCGTCTTTCC CCCCATCATC GTCAATAATA TGATGGCCAT ATCCTGAAAG 4 20 

ATAACGATAA ATA 4 33 
(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 302 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 

GCGCTCTGTT CCCGTTCCTG TTCATCACCA TCGCCTGTGG TGCGGTATCT GGCTTCCACG 60 

CGCTGATCTC TTCCGGTACG ACGCCAAAAC TGCTGGCTAA TGAAACCGAC GCGCGTTTCA 120 

TCGGCTACGG CGCAATGCTG ATGGAGTCCT TCGTGGCGAT TATGGCGCTG GTTGCTGCGT 180 

CCATCATCGA ACCGGGTCTT TACTTCGCGA TGAACACCCC GCCTGCTGGC CTTGGCATCA 24 0 

CCATGCCTAA CCTGCATGAA ATGGGGTGGC GAGAACGCGN CGGATTCATC ATGGCGCANT 30 0 
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(2) INFORMATION FOR SEQ I D NO : 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 656 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNES5 : double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 



AATTAATAAG 


CCAAATACTA 


CATCACGTAA 


TACTTGCAAA 


GAAGTGCGTG 


GAGTTTGACT 


60 


AATAATGGGT 


TTGTCCATTA 


ATACTTACCC 


AAATAATCGG 


CT CAT TAT AG 


CAACGAGCCT 


120 


CCGATTAAAA 


TTTAAAATAC 


TCAATCATTT 


AATAGCAACG 


TTAGCAGCTA 


CAGCGATTTG 


180 


ATAAATAATT 


TGTGTGATAT 


CTTTAAATGA 


TTGCATGGTT 


TTGCTATCAA 


CCTGAGGTAG 


240 


AACCAATATC 


TGATCCCCCG 


GTTGTACTTT 


ACCTTGCCCT 


TTAAATTCTA 


CAAGACCATT 


300 


TGCATGTACA 


ATAGCAATTC 


GCTTGTCGTT 


AGCTCGCTCA 


GTAAAACCTC 


CGGCCCATGC 


360 


AACATAATCA 


TCCAAATTAG 


CATCGGCATT 


ATATACTACT 


GCTTGTGGCA 


TCAACACTTC 


420 


ACCCCCCACT 


TGAATAAGAT 


CAGTCTTATT 


TGGAATAACT 


ATTTGATCGC 


CTTGTTCTAA 


480 


TTGGATAWTG 


GCAATAACAC 


CTTTATCTGC 


AACTACTACT 


TTACCAAGCG 


GTKGAACTTT 


540 


ACGAGCCTTT 


YCAACAAACT 


GCATCACTAA 


CTCTGCTTCT 


TTAGCACGTA 


TATTCGCCTC 


600 


ACCATCAGAT 


CGCGCGGGTG 


TGGTAAANTT 


CATACGTTCC 


AAGCGGTTTA 


GAGATT 


656 



(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 436 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 



ATATGTTATC 


TGGATCCAGA 


TAAAGAGCGT 


TCTTGACCCG 


CTATATCCAG 


ACAGGTCAGT 


60 


TACACCCTGT 


CCGGAAAAAC 


TGATCGGAAT 


AACAACAGTA 


TATTTTCTAA 


TACACTGGCA 


120 


AATGGTGCCG 


GCGGTGTGGG 


GATTCAGCTT 


CTGGATAGCG 


CTGGTAATGC 


GGTTGCTGCT 


180 


GGACAGAAGA 


AATATCTGGG 


ACAGGTAGGA 


CCATCAACAT 


CTCTCAATAT 


TGGATTAAGG 


240 


GCATCTTATG 


CACTGACCAA 


TGGACAGACT 


CCACCTACTC 


CCGGACGAGT 


TCAGGCGTTA 


300 


GTTGATGTTA 


CCTTCGAGTA 


TAATTAGGAA 


TGTCGGGGAT 


GGGCTATCCC 


CGATATTATT 


360 


GCAGGATTAG 


TCTGTGATAC 


AGATATACAG 


CCCATATGAA 


CAACTGTTTG 


CATATATAAA 


420 


AATGATGATA 


ATTTTA 










436 
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(2) INFORMATION FOR SEQ ID NO: 120: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 559 base pairs 
(E) TYPE: nucleic acid 
(C) STRANDEDNESS : double 
( I') TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 

AATAATTAAA TTTGGAGGGA TCAGTTTTCT GATAATGTTC TGTTATTAAA ACATTATCCC 60 

ATGGGGCGTA GTTATATCAA TTAGCAGGAT CTTATGAGTT AACTAACATC AGTTTTGAAT 120 

TTTTAATGGG GGTAATTTAT CTTTTACTAA AAATATTTTA ACTATTAATA TAGCATCATG 180 

GTTGTTACGG TTTGTTTTAA TTCTATTTTA TAATGTGCTA TATATTGTAT TTTTGTGCTT 24 0 

AGATAAATAT GTTTTTTCAT TACTTTAGTG ATGTTAATAT TTTGCGTGTA GTAAAAATCA 300 

TTGTTATAAC AAATGTCACT GTTGCTATAC TTTGCTGAAC TGTTTATCGG TCATTTTGAT 3 60 

TCAATCACTG GTTCTATATT TTTTAATAAC CGTTCTGTAG CGATTAATAT ATTGCTCTCC 4 20 

AGAGGATACA CTATATGAAA TATATTAAAA GTCATTAATT TTNATTCAAT GTTGTTTAGA 4 80 

GTTATGTTCA GTGTTTGGNA ATAGGATGTG TTTCTAAACC GTCTTGGGTT CTATAATAAA 54 0 

TTCTATTCTT ANAGGTTTT 55 9 



(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 481 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 

CATGTCCCTT CCTGAATACT GGGGAGAAGA GCACGTATGG TGGGACGGCA GGGCTGCTTT 60 

TCATGGTGAG GTTGTCAGAC CTGCCTGTAC TCTGGCGATG GAAGACGCCT GGCAGATTAT 120 

TGATATGGGG GAAACCCCGG TACGGATTTA CAGAATGGTT TCTCCGGACC TGAAAGAAAA 180 

TTCAGCCTCC GGCTCAGGAA TTGTGAATTT AACAGTCAGG GTGGGAACCT TTTCTCTGAT 24 0 

TCCCGGATAA GGGTGACTTT CGATGGCGTC CGGGGTGAAA CGCCGGATAA GTTTAATTTA 300 

TCCGGTCAGG CAAAAGGCAT TAATCTGCAG ATAGCTGATG TCAGGGGAAA TATTGCCCGG 3 60 

GCAGGAAAAG TAATGCCTGC AATACCATTG ACGGGTAATG AAGAAGCGCT GGATTACACC 4 20 

CTCAGAATTG TGAGAACGGA AAAAAACTTG AAGCCGGAAA TTATTTTGCT GTCTGGGATT 4 80 

A 481 
(2) INFORMATION FOR SEQ ID NO: 122: 



WSDOCID <WO. _ 9822575A2 I 



WO 98/22575 



PCT/US97/21347 



-229- 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 535 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 



CCATATAGTG 


ACTTCATTGA 


ACAAAATGTA 


AATGGAATCT 


TGCTGGAGAA 


TGACCCACAT 


60 


ATATGGATAA 


AAGCTCTTTC 


ATTACTTGTT 


AGTGCAGATC 


ATAAACGTAG 


CGAGTTGGCG 


120 


TTCAATGCTA 


AAAAATATGC 


TTGTAAAATT 


GTAGGTGTCG 


AGTAAAAAGA 


TATTTTTATT 


180 


TAATTGGTGC 


TATTGAATGT 


TTAAAAATCG 


AACTGATTGG 


TGTTTTAATA 


TTAATCATAG 


240 


GTTATGATGC 


AAAAATATAT 


TAGGCATTGC 


CTGCTTCAAT 


TAACTTGAGA 


GTGTAAGTTG 


300 


AATTGAAATA 


TGGTTATATG 


ATAAAGCAAT 


ATATGTTAAT 


ACATATGTCA 


ACCGAAAATG 


360 


CCATTATGTG 


TTTTTTACTT 


TATCTGTAAC 


GACACAATAT 


ATAAAATAAG 


GCTAATAATC 


420 


AAAACGCTTT 


TTAATTTGAT 


TGTTTTGAAT 


CAAGTGACTA 


AGAAATTCTC 


TTGCTGCAAA 


480 


TAACTCCCTT 


AGTGATTTTT 


TTTGAGTCTA 


TTTTATTCTC 


TGGGCATGGT 


CATGC 


535 



(2) INFORMATION FOR SEQ ID NO: 12 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 412 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 

CCGGCCCCAT AATGATGGTT TTATTAAGGT TAGCGCCGAC GGTTTCGATG AACGATTTCA 60 

GGTCGGTATC TTTAAAATTA GCGGTGAAAG TGGCTTCTTC CGCCCAGACC GGTGAACTGC 120 

ATAATGCCGC TGCCAGCACC AGCGGCAGTA AACGCTTTTT TGTTTTGAGG CCAGTTGTCT 180 

TCTTACGCCA GACCGACAAC GTCATATCAC GCCAAAACAC GATGAATGAT TCTCCTGGAT 24 0 

TAAATGCGGT TAGCGCAGCG CGATGGAAAT GTCGTGGCGC GCACCCTTGC GTAAAACCGT 300 

AAGTTGAATG GAATCCATTG AAGGTAACTG CCGCATCAGA GCAATCATTG CTCGTGGATC 3 60 

AGTGAAATCC TGCTGATTTA GCGCAAATGC GATATCGCCT TCCTTAAAAC CG 4 12 
(2) INFORMATION FOR SEQ ID NO: 124: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 6 base pairs 

(B) TYPE: nucleic acid 

(C) STPANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124: 

TAGCCTGTTC AGCGTATATT TGGGATGAGA AGCCAAAGTG GCTTTGGTGG TGTCCCAGCC 60 

CAGGTTTTTA TTACTGCTGG TTATTTACCT TTCATGTTTT TCAATAAAGT TGTGACTCAG 120 

TTGAAATCTG CTGTCAATGC TAATATGGGA CTTTTTTGTT ATAGACAAGT GACTCCTTTT 180 

GCAACTTTTA TAGCACGTTT TATGCTAGAA ACAATGGTGG GCATGATTGT CGGTATAATC 24 0 

CTAGTACTAG GATTATTGTG GTTTGGCTTT GATGCAATAC CTGCGGATCC ATTGCAAGTG 300 

ATCCTTGGTT ATTCTCTTCT GATGCTGTTT TCTTTTTCTC TTGGTATTGT ATTTTGTGTT 360 

ATTTGTAATT KRGCGARAGA GGCAGATAAA TTTCTTAGCT TGTTAATGAT GCCTTTGATG 4 20 

TTTATCTCTT GTGTTATGTT TCCTCTTGCT ACTATTCCCC CTCAATATCA GCATTGGGTT 480 

TTTATGGAAT CCACTTGTGC ATGCTGTAGA ACTAATCCGA AGGGCATGGG ATATCTGGGT 54 0 

TATCGTAGTC CTGATGTAAG TTGGGCGTAT CTGTCG 57 6 



(2) INFORMATION FOR SEQ ID NO: 125: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 132 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125: 

TTACCAAGCA GGATCTGATG CAACTGGAAG AAGGCTTTGA ATATCGTATC ATTGGCTGCT 6 0 

CCATGTATAA CATGTTGGCC GCCGTACGCG GTGCCTATGA CAGCTTTGAA AATGTCAAAG 120 

GGGTGAATTG CT 132 



(2) INFORMATION FOR SEQ ID NO: 126: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 542 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126: 

GATTAGGGGT CACTCAGGAT TATAAAAAAG CGGCAGAATA CTATAAAAAA GGTGATAAAA 60 

ATAATGATAT TACAGCACAA TACCGTCTGG CAAAACTTTA TGAACAAGGT AACGGTGTAA 120 

AACGTGATTA TCAACAAGCG ATAAACCTTT ACCTTAAACA TATCAACAGA ATGGATCACA 180 

TCACTGCCCC CAGTTTTGTG GCTCTGGGTG ATATCTATTC TCTGGGATTS GGGGTAGAGA 24 0 

AAAACCCACA ACTGGCTGAA AAATGGTATC AAAAAGCGAT AGATGCAGCT AATACACAAC 300 

ATAACCAGGA AATAAATCAT TAAACGACAA CACTTAATAC CATATTGTGA AGATGTTCAG 3 60 



JNSDOCID <WO 9822575A2 I > 



WO 98/22575 



PCT/US97/21347 



-231- 

ACATGGCGGA ATTCCCCTAT TCTTTGTTGG CGCTTACAAC AGACTATATT CCGCCATATC 4 20 

TGTCTT TATT GTGTATAAAC CATCGATACT GATGTTTGAT AGTGCTAAAT AATCATTGGC 4 80 

GCAATCACAA AGCCTAATGC CACTCCAGCA ATAATTCCCC CCAACCCAGG CAGGATAAAT 54 0 

GG ^4 2 



( 2 ) INFORMATION FOR SEQ ID NO: 127: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 382 base pairs 

(B) TYPE: rmcleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 



GAACCACTTA 


GCGGCAGCTA 


TCGGGAATCG 


CCTGCTGAAA 


GACGGTCAGA 


CAGTGATTGT 


60 


GGTTACCGTG 


GCTGATGTTA 


TGAGTGCCCT 


GCACGCCAGC 


TATGACGATG 


GGCAGTCAGG 


120 


CGAAAAATTT 


TTGCGGGAAC 


TGTGCGAAGT 


GGATCTGCTG 


GTTCTTGATG 


AAATTGGCAT 


180 


TCAGCGCGAG 


ACGAAAAACG 


AAGCAGGTGG 


TACTGCACCA 


GATTGTTGAT 


CGCCGGACAG 


240 


CGTCGATGCG 


CACGTGGGGA 


TRCTGACAAA 


CCTGAACTAT 


GAGGCCATGA 


AAACATTGCT 


300 


CGGCGARCGG 


ATTATGGATC 


RCATGACCAT 


GAACGGCGGG 


CGATGGGTGA 


ATTTTAACTG 


360 


GGAGACTGGC 


GTCCGAATGT 


CG 








382 



(2) INFORMATION FOR SEQ ID NO: 128: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 126 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

( D ) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: 

CGTCCCGCAC CCGGAAATGG TCAGCGAACC AATCAGCAGG GTCATCGCTA GAAATCATCC 60 

TTAGCGAAAG CTAAGGATTT TTTTTATCTG AATTCTAGCC AGATCCCCGC TGATTTATGC 120 

TGGTTA 126 

(2) INFORMATION FOR SEQ ID NO: 129: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 258 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(>:;) SEQUENCE DESCRIPTION: SEQ ID NO: 129: 
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ACCCCCAGCC TACCTGGGGG TTTTCTGTGC ACAAAAAATC CCGGCATAAT GGCCGGGATT 60 

TGCGAGGTTT CCCACTATTT CTTGATTCCT AAACGGAAGA TATGAGTTGG GAATAAAGGT 120 

TGTATTATCA GTTGATCATT AN AAAT GAAT AATTTGGGCG ATAAAGCTGT TACGTCATAG 180 

ATATTTTCAG CGATTAATCT TAGANTTGAC CTAAAAACTG GAATACTTGC ATCATCTGCA 24 0 

AAGACAAACA TGTCATCG 258 



(2) INFORMATION FOR SEQ I D NO : 130: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 399 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 130: 

AACCAGCGGT TCGCATCATC TCATCCCACT GACTCTCCGC TTTTGACAGA TCTGCATATC 60 

CTCGGGCCAA CTTATCCAGT ACTCCGTAGT TTGCCGATTT ATTCACCCGC CAGAACACCG 120 

CCTCACCTGC ATCGGCAAGC CGGGGGGAAA ACTGATACCC CAGTAGCCAG AACAGACCGA 180 

AAATAATATC GCTGCTACCC GCAGTGTCTG TCATGATTTC AACTGGATTC AGCCCTGTCT 24 0 

GCTGCTCAAG AAGTCCTTCC AGTACAAAAA TCGAATCCCG TAATGTACCG GGTACCACAA 300 

TGCCATGGAA CCCAGAGTAC TGATCAGATA CGAATTATAC CAGGTGATGC CTCGTCCAGA 3 60 

ACCAAAATAT TTTCTGTTAG ATCCTGAGTT GATGGTCTT 399 



(2) INFORMATION FOR SEQ ID NO: 131: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 745 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 

AAATAACATC AACATACATT TGACTCGCGG GGGAAACGTT TACGGAGTCT TCATACTGGC 60 

ACTTTTTTAT GCTGCTGACT ACTCTTCGTC ATCGCCATCA ACATGCGCAC GAATCAGCGC 120 

CATAAACGGT TTGCCAAAGC GTTCCAGCTT GCGCATCCCA ACGCCGTTAA CGCTGAGCAT 180 

TTCGCTGGCG GTGATCGGCA TCTGTTCAGC CATCTCAATC AAGGTTGCGT CGTTAAACAC 24 0 

CACGTACGGC GGGACATTAC TTTCATCGGC TATCGATTTA CGCAGTTTGC GTAATTNGGC 300 

GAACAGTTTG CGATCATAGT TGNCGCCGAN CGATNTCTGC ATCGCTTTCG GTTTGAGCGC 3 60 

CACGATACGC GGCACGGCAA TTGCAAAGAG GATTCGCCGC GCAGCACCGG GCGCGCGGCC 4 20 

TCTGTCAGTT GTAGGGCAGA ATGCTGGGCA ATATTTTGCG TCACCAGGCC GAGGTGAATC 4 80 
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AGCTGGCGGA TCACGCTCAC CCAATGTTCA TGGCTTTTAT CACGGCCCAT GCCATAGACT 54 0 

TTCAGTTTGT CATGACCATA GTCGCGGATA CGCTGGTTAT TAGCACCACG AATCACTTCC 600 

ACCACATAAC CCATCCCAAA CCGCTGATTC ACACGACCAA TGGTGGAAAG GGCAATCTGA 660 

GCATCGGTTG AACCGTCGTA CTGTTTCGGC GGATCGAGGC AGATATCGCA GTTCNCCGCA 720 

CGGCTCCTGA CGCCCTTCGC CAAAA 7 4 5 



(2) INFORMATION FOR SEQ ID NO: 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 439 base pairs 
(Ei) TYPE: nucleic acid 

(C) 3TRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 132: 

AGAATGGCGG CTTCTTGCCC CCCTTTGCCC CGGTCCTGAC TAGCATGGCT GGAGTCCAGT 60 

GTCCAGGCCA CGACCATGCT CATCATGGAA GCAGCTTTTG TAGTACANTC GCAGCTTATT 120 

TTCCTGGAAC GAAATGTCTG GCATCGTGGT GCATAACATA ACCCCCAATG CCCAGCAGAT 180 

GCACAGAAGG TTCTAGAATC GCCCACTGAT ATCCCATACA AAATTTACCA AAACGTGTTC 24 0 

GTATTTCTCG TATAAATAAT GTCTCTATGG TGACGTTCTA GACTTCAAAC CCACTTTTTG 300 

AATTTGATGA TGTGCTCCTA ATCTCTTCAG GAATGTAACG CCCTTGGTTT ACAGCTACCA 360 

ATACACTGGA GGTATACTTA TCTGCAACTG GATGAACTAG ATGTACTTGA GCAAACATTT 420 

CATAAGCTCG ACGACAGTT 4 39 



(2) INFORMATION FOR SEQ ID NO: 133: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 350 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133: 

CTGGAAAGCG ACGTTGATGG ATTAATGCAG TCGGTAAAAC TGAACGCTGC TCAGGCAAGG 60 

CAGCAACTTC CTGATGACGC GACGCTGCGC CACCAANTCA TGGAACGTTT GATCATGGAT 120 

CAAMTCATCC TGCAGATGGG GCAGAAAATG GGAGTGAAAA TCTCCGATGA GCAGCTGGAT 180 

CAGGCGATTG CTAACATCGC GAAACAGNAC AACATGACGC TGGATCAGAT GCGCACCGTC 24 0 

TGGCTTACGA TGGACTGAAC TACAACACCT ATCGTAACCA GATCCGCAAA GAGATGATTA 300 

TCTCTGAAGT GCGTAACAAC GAGGTGCGTC GTCGNATCAC CATCCTGCCG 350 
(2) INFORMATION FOR SEQ ID NO: 134: 
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(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134: 

CCCCAAGATT GCTAACAAAT GCGCGTTGTT CATGCCGGAT GCGGCGTGAC CGCCTTATCC 60 

GGCCTACGAA ACCGCAAGAA TTCAATATAT TGCAGGAGCG GTGTAGGCCT GATAAGCGTA 120 

GCGAWTCAGG CAGTTTTGCG TTTGCCCGCA ACCTTAGGGG ACATTTAGCG ACCCCATTTA 18 0 

TTTCTCACTT TTCCGCCTCA TCATCGCGCG TTAATTTCTT TCATGAATCA CGCTTTACAA 24 0 

TATCCAGCGC GCGCANAACG GTACTGGCAG GGATCTGAAT TTTCCTCCAG CAGCACAATC 300 

AAATCGACAG CCAGTTTGAC ATCGTCAAGG GGCATTTTCC CAGTGACATA ATCTCTCCAT 3 60 

TGCTAAGCGG GTTAAAACGC GCTAACCTGT TTCGATTTTT 4 00 



(2) INFORMATION FOR SEQ ID NO: 135: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 463 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135: 

CTATCCTTAT GACCACCCAA CTACNTCATT TACACCCAAA CCAGCGATCT GAATAAAGAA 60 

GCGATTGCCC AGTTACGACT GGGCGGAAAA TGCGCGTAAG GAT G AAG T AA AGTTTCAGTT 120 

GAGCCTGGCA TTTCCCTGTG GCGTGGGATT TTAGGCCCGA ACTCGGTGTT GGGTGCGTCT 180 

TATACGCAAA AATCCTGGTG GCAACTGTCC AATAGCGAAG AGTCTTCACC GTTTCGTGAA 24 0 

ACCAACTACG AACCGCAATT GTTCCTCGGT TTTGCCACCG ATTACCGTTT TGCAGGTTGG 300 

ACTGCGCGAT GTGGAGATGG GGTATAACCA CGACTCTAAA CGGGCGTTCC GACCCGACCT 360 

CCCGCAGCTG GAACCGCCTT TATACTCGCC TGATGGCAGA AAACGGTAAC TGGCTGGTAG 4 20 

AAGTGAAGCC GNGGTATGTG GTGGGTAATA CTGACGATAA CCC 4 63 



(2) INFORMATION FOR SEQ ID NO: 136: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 584 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136: 
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TTGGTCAGCC GTACCTGAAT GGGGGCTGAT GCCCGGCTGG TTAATGGCAG GTGGTGTGAT 60 

CGCGTGGTTT GTGGGTTGGC GCAAAACAGG CTGATTTTTT CATCGCTCAA GGCGGGCCGT 120 

GTAACGTATA ATGCGGCTTT GTTTAATGAT CATCTACGAC AGAGGAACAT GTATGGGTGG 180 

TATGAGTATT TGGGAGTTAT TGATTATTGC CGTCATCGTT GTAGTGCTTT TTGGCACCAA 24 0 

AAAGCTCGGC TCCATCGGTT CCGATGTTGG TGGGTCGATC AAAGGCTTTA AAAAAGCAAT 3 00 

GAGCGATGAT GAACCAAAGC AGGATAAAAC CAGTCAGGAT GCTGATTTTA CTGCGAAAAC 3 60 

TATGGCCGAT AAGCAGGCGG ATACGAATCA GGAACAGGGT AAAACAGAAG ACGCGAAGCC 4 20 

TACGNTAAAG AGCAGGTGTA ATCCGTGTTT GATATCGGTT TTAGCGNACT GCTATTGGTG 4 80 

TTCATGATCG GCCTGGTCGT TCTGGGGGCG CAACGACTGC CTGTGGCGGT AAAAACGGTA 54 0 

GCGGGCTGGA TTCGCGCGTT GCGTTCACTG GCGACAACGG TGCA 58 4 



(2) INFORMATION FOR SEQ ID NO: 137: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 527 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: 

GCAGGCAGGA GGAACTGCCC AGTGATACGG TTATTCGTGA TGGCGGAGGG CAGAGCCTTA 60 

ACGGACTGGC GTTGAACACC ACGCTGGATA ACAGAGTTGA GCATTGGNTA CACGGGGGAG 12 0 

GGAAAGCAGA CGTTACAATT ATTAACCAGG ATGTTTACCC AGACCATAAA ACATGGCGGA 180 

TTGGCAACCG NAACCATCGT CAACACCGTT GCAGAAGKTG GTCCGGAGTC TGAAAATGTG 24 0 

TCCAGCGGTC AGATGGTCGG AGGGACGGCT GAATCCACCA CCATCAACAA AAATGGCCGG 30 0 

CAGTTATCTG GTCTTCGGGG ATGGCACGGG ACACCCTCAT TTGCGCTGGT GGTGACCAGA 360 

CGGTACACGG AGAGGCACAT AACACCCGAC TGGAGGGAGG TTAACCAGTA TGTACACAAC 4 20 

GGTGGCACGG CAACAGAGAC GCTGATAAAC CGTGATGGCT GGCAGGTGAT TAAGGAAGGA 4 80 

GGGAACTGCC GGCGCATTAC CACCATCAAN CCNGAAAAGG GAAANCT 5 27 



(2) INFORMATION FOR SEQ ID NO: 138: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 441 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138: 
GTCAGTCTCT GGGGGAAGTG CGTGTTCCGA CCGGGGAAAT GTGGTGGAGA AAGTTATTGA 60 
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AGGGGCTTAC GAGGTGGTGG GGGT7TTTGA CCGGATTGAG GAAAAGCGTG ATGCCATGCA 120 

GTCGCTGATT CTGCCGCCAC CGGACGCCAG GCGCTGGCAC AGGCGGCACT GACTTAGCGT 180 

TATGGTGACG AACMTCARCC CGTCACCACC GCCGACATTC T3ACACCACG ACGCCGGGAR 24 0 

GATTAGGGTA AGGACGTGTG GAGTGGTTAT GAGACCATTG AGGAGAATAT GCTGAAAGGC 30 0 

GGAATTTCCG GTCGCAGTGC CAGAGGAAAA GGTATCCATA CGGGTGCCAT TCACAGCATC 3 60 

GACACCGACA TTAAGCTCAA CCGGGGATTG TGGGTGATGG CTGAAACGCT GCTGGAGAGT 4 20 

ATGGGCTGAT GCCGTTTCCN T 44 1 



(2) INFORMATION FOR SEQ ID NO: 139: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 398 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139: 

CGAGCGAGAT GAACTTCGAG GGCGGTGTGA GCCAGTCGGC TTACGAGACA CTGGCGGCGC 60 

TTAATCTGCC GAAACCGCAG CAAGGGCCGG AAACCATTAA TCAGGTTACC GAGCATAAGA 12 0 

TGTCAGCTGA GTAAGCCTGT ATGCCGGATA AGGCGCTCGC GCCNATTCCG ATGAAATAAG 180 

GCGCATCGGG CCTGAAGGAA AGCCGTATGN ATACACCCGC AGCCCGCATC CGGCAAGTTA 24 0 

CAACAAATAA CCTTTAACCA TGCTTTTTGA TGTTTTTCAG CAATACCCCG CGGCGATGCC 300 

CATACTGGCA ACCGTCGGGA GGGATTGATC ATCGGCAGTT TTTTGAATGT GGTGATTTGG 3 60 

GCGTTACCCC ATCATGCTGC GCCAACAAAT GGCGGAGT 3 98 



(2) INFORMATION FOR SEQ ID NO: 14 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 580 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 14 0: 

GCCGAACAGA CACAGCAATA TGAACCCTGC CAGCGCAGAC GCTTGCTGAT TAATGCTCTG 60 

AACAAAAGGC GAAGAATGGC AAATCCTGCG ATCAGCAAAG TCAGCGCACC GACTATCTGT 120 

AACATAGTCA CTCCGTGATG AATATCATGT GTATTGTGAA TGCCAGTGAA TGTGGCACTG 180 

AAGCGTTTGC ACCTGTCCGG GTCCCGGTCA TGATGACCGS AACAGAGAGA CAATGCCGAA 24 0 

TTATCAGAAG GTCACATTCA GTGTGGCTTG GCCGTTATAA CCTTCAGCGC TGCTGCCGCT 300 

GACGCTGTGG GCATAACCGG CCTGAACGCC CAGGGTGATA TTTTCCCGGA CACGGGCTTC 3 60 
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CAGTCCGGCC TGCAGCTCCA GTGACGTGCC ATTCCGGGAC GGTGAGAACG TCATGTTACT 4 20 

GCCGGCTGCG GCTGTACCCA TGCTCATGTC TCCCCGGGAG CTGAAGGTGC GGATAACAGA 4 80 

AGGCTGTACC CACCCGTTCA CCGGCAGTTC ACGCACACTG TGTTTTGCAC TGTCACGCAA 54 0 

GGTGTCACGG GATGAGGTGC CTTCANCAAA AGGTCATATT 58 0 
(2) INFORMATION FOR SEQ I D NO : 141: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 446 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 

TGCGGACATC CAGCGTTCCG CCATCATCCA CACGGGTTCT GGTGGCTGTG TGTCCGGTCA 60 

GCACATCCAG ACGGCCGCCA TTTTCCAGTA CGACATTATC AGCTTTACCC TCCACAACAG 120 

AGAATGCTCC CAGGCGGTTT GTGCCGGTGA CGGTTGCAGC AGTGCTGGTA ACCAGTGCTC 180 

CGCCCGTGTT CTGGGTGACA TCAGACGCTT TACCGCCGGC ATTCACCTGC AGCTTTCCTT 24 0 

TCTGGTTGAT GGTGGTATGC GCGGCAGTTC CTCCTTCCTT AATCAMCTGC CAGCCATCAC 300 

GGTTTATCAG CGTCTCTGTT GCCGTGCCAA CGTTGTGTAC ATACTGGTTA MCTCCCTCCA 360 

GTCGGGTGTT AWGTGSCTCT CCGTGTANCG TCTGGTCANC AACAACGCAA ATGANGGTGT 4 20 

CCCGTGCCAT CCCCGAAGAC CAGTAA 446 
(2) I N FORMAT I ON FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 327 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: 

TGAATACGTT AAGTCAGCAG ACCGGCGGAG ACAGTCTGAC ACAGACAGCG CTGCAGCAGT 60 

ATGAGCCGGT GGTGGTTGGC TCTCCGCAAT GGCACGATGA ACTGGCAGGT GCCCTGAATA 120 

ATATTGCCGG AGTTCGCCAC TGACCGGTCA GACCGGTATC AGTGATGACT GGCCACTGCC 180 

TTCCGTCAAC AATGGATACC TGGTTCCGTC CACGGACCCG GACAGTCCGT ATCTGATTAC 24 0 

GGTGAACCCG AAACTGGATR GTCTCGGACA GGTGGACAGC CATTTGTTTN CCGGACTGTA 300 

TGAGCTTCTT GGAGCGAAAC CGGGTCA 327 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCTRulc \3bis) 

A. The indications made below relate to the microorganism referred to in the description 

on page 5 , line 8 

B. IDENTIFICATION OF DEPOSIT Further deposits arc identified on an additional sheet |X | 
Name of depositary institution 

American Type Culture Collection (ATCC) 

Address of depositary institution (including postal code and country) 

12301 Parklawn Drive 
Rockville, Maryland 20852 
United States of America 



Date of deposit 


Accession Number 




September 23, 1996 




97726 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet {~j 



DNA plasmid PAI-1 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are noi for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the International Bureau later (specify the general nature of the indications e*g., 'Accession 
Number of Deposit') 



For receiving Office use only 



y | This sheet was received with the international application 



Authorized officer 



For International Bureau use only 



| | This sheet was received by the International Bureau on: 



Authorized officer 



Form PCT/RO/I34 (July 1992) 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 



(PCT Rule 


\3bis) 


1 A. The indications made below relate to the microorganism referred to in the description 
1 R 8 

I nn page D .line _ ■ 


I B. IDENTIFICATION OF DEPOSIT 


Further deposits are identified on an additional sheet | 


1 Name of depositary institution 




J American Type Culture Collection (ATCC) 




1 Address of depositary institution (including postal code and country) 




1 12301 Parklawn Drive 

Rockville, Maryland 20852 
1 United States of America 




j Date of deposit 

J September 23, 1996 


Accession Number 

97727 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet Q 



DNA plasmid PAI-2 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the Indications are not for all designated Stoics) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the International Bureau later (specify the general nature of the indications^, 'Accession 



Number of Deposit') 



For receiving Office use only 



| { This sheet was received with the international application 



Authorized officer 



For International Bureau use only 



\ \ This sheet was received by the International Bureau < 



Authorized officer 



Form PCT/KO/134 (July 1992) 

DOCID <WO _ 962257SA2 t > 



WO 98/22575 



PCT/US97/21347 



-240- 

INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCT Rule \3bis) 



A. The indications made below relate to the microorganism referred to in the description 
5 , line 14 



on page 



B. IDENTIFICATION OF DEPOSIT 



Further deposits are identified on an additional sheet j | 



Name of depositary institution 

American Type Culture Collection (ATCC) 



Address of depositary institution (including postal code and country) 

12301 Parklawn Drive 
Fockville, Maryland 20852 
United States of America 



Date of deposit 

September 23, 1996 



Accession Number 



98176 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet f | 

Escherichia coli, 596 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the International Bureau later (specify the general nature of the indications e.g„ "Accession 
Number of Deposit") 



For receiving Office use only 



j | This sheet was received with the international application 



Authorized officer 



For International Bureau use only 



| | This sheet was received by the international Bureau on: 



Authorized officer 



Form PCT/KO/134 (July 1992) 

ID <WO 3822575A2 I > 



WO 98/22575 



PCT/US97/21347 
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IVhat Is Claimed Is: 



1 1. An isolated nucleic acid molecule, comprising a polynucleotide 

2 having a nucleotide sequence at least 95% identical to a sequence selected from 

3 the group consisting of: 

4 (a) a nucleotide sequence of an open reading frame depicted in one of 

5 Tables 1 through 4; 

6 (b) a nucleotide sequence beginning with the first initiation codon 

7 encountered reading 5' to 3' in an open reading frame depicted in one of Tables 1 

8 through 4, and ending with the 3' terminal stop codon; 

9 (c) a nucleotide sequence beginning with the first initiation codon 

10 encounter reading 5' to 3' in an open reading frame depicted in one of Tables 1 

1 1 through 4, and ending with the nucleotide preceeding the 3' terminal stop codon; 

12 (d) a nucleotide sequence of (a) excluding codons for amino acids 

13 eliminated during processing of the putative protein identified in one of Tables 1 

14 through 4; or 

15 (e) a nucleotide sequence that is complementary to any of the 

16 nucleotide sequences in (a), (b), (c), or (d). 

1 2. An isolated nucleic acid molecule of claim 1, wherein said 

2 nucleotide sequence is 100% identical to the nucleotide sequence of an onen 

3 reading frame depicted in Tables 1 through 4, or a complement thereof 

1 3 . An isolated nucleic acid molecule, comprising a polynucleotide that 

2 hybridizes under stringent hybridization conditions to a nucleic acid molecule of 

3 claim 2. 

1 4. An isolated nucleic acid molecule, comprising a polynucleotide that 

2 encodes the amino acid sequence of an epitope-bearing portion of an E. coli J96 

3 PAI protein encoded by an open reading frame depicted in one of Tables 1 

4 through 4. 
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1 5. A method of making a recombinant vector, comprising inserting 

2 an isolated nucleic acid molecule of claim 1 into a vector. 

1 6. A recombinant vector produced by the method of claim 5. 

1 7. A method of making a recombinant host cell, comprising 

2 introducing a recombinant vector of claim 6 into a host cell. 

1 8 . A recombinant host cell produced by the method of claim 7. 

1 9. A recombinant method for producing an E. coli J96 PAI 

2 polypeptide, comprising culturing a recombinant host cell of claim 8 under 

3 conditions such that said polypeptide is expressed and recovering said polypeptide. 

1 10. An isolated polypeptide of an E coli J96 PAI IV or PAI V protein 

2 encoded by a polynucleotide of claim 1. 

1 1 1 . An isolated polypeptide of an E. coli J96 PAI IV or PAI V protein 

2 encoded by a polynucleotide of claim 2. 

1 12. An isolated polypeptide comprising an immunogenic epitope of an 

2 E. coli J96 PAI IV or PAI V protein encoded for by an open reading frame 

3 depicted in one of Tables 1, 2, 3 or 4. 

1 13. A vaccine, in dosage form, comprising 

2 (a) a pharmaceutically acceptable diluent, carrier, or excipient, and 

3 (b) an antigen selected from the group consisting of: 

4 (i) a polypeptide having an amino acid sequence at least 95% identical to 

5 an amino acid sequence encoded by a uropathogenic E. coli J96 PAI IV or PAI 

6 V open reading frame depicted in Tables 1, 2, 3 or 4, and 

7 (ii) a polypeptide comprising an immunogenic epitope of an E. coli J96 

8 PAI IV or PAI V protein encoded for by an open reading frame depicted in one 

9 of Tables 1, 2, 3 or 4; 
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10 wherein said antigen is present in an amount effective to elicit protective immune 

1 1 responses in an animal to pathogenic E. coli 

1 14. An isolated antibody that binds specifically to a polypeptide of 

2 claim 10 or 11. 

1 1 5 . An antibody having binding affinity to a polypeptide according to 

2 claim 12. 

1 16. A method of detecting a pathogenic E. coli antigen in a sample, 

2 comprising: 

3 (a) contacting said sample with an antibody according to claim 14 or 

4 1 5 under conditions such that immunocomplexes form, and 

5 (b) detecting the presence of said antibody bound to said antigen. 

1 17. A diagnostic kit comprising: 

2 (a) a first container means containing an antibody according to claim 

3 14 or 15 and 

4 (b) second container means containing a conjugate comprising a 

5 binding partner of said antibody and a label. 

1 18. A hybridoma which produces an antibody according to claim 14 

2 or 15. 

1 1 9. A method of detecting the presence of antibodies to pathogenic E. 

2 coli in a sample, comprising: 

3 (a) contacting said sample with a polypeptide according to one of 

4 claims 10, 1 1 or 12 under conditions such that immunocomplexes form, and 

5 (b) detecting the presence of said antibody bound to said antigen. 

1 20. A kit for detecting the presence of antibodies to pathogenic E, coli 

2 in a sample comprising at least one container means having disposed therein a 

3 polypeptide according to one of claims 10, 11 or 12. 
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1 21 Computer readable medium having recorded thereon one or more 

2 nucleotide sequences depicted in SEQ ID NOs: 1 through 142, or nucleotide 

3 sequences at least 99.9% identical thereto. 

1 22. Computer readable medium having recorded thereon a nucleotide 

2 sequence of at least one uropathogenic E. coli J96 pathogenicity island open 

3 reading frame depicted in Tables 1 through 4, or a complement thereof 

1 23 The computer readable medium of claim 21, wherein said medium 

2 is selected from the group consisting of a floppy disc, a hard disc, random access 

3 memory (RAM), read only memory (ROM), and CD-ROM. 

1 24. The computer readable medium of claim 22, wherein said medium 

2 is selected from the group consisting of a floppy disc, a hard disc, random access 

3 memory (RAM), read only memory (ROM), and CD-ROM. 

1 25. A computer-based system for identifying fragments of 

2 uropathogenic E. coli J96 pathogenicity islands PAI IV and PAI V that are 

3 homologous to target nucleotide sequences, comprising: 

4 a) a data storage means comprising a nucleotide sequence of 

5 SEQ ED NOs: 1 through 142, or a nucleotide sequence at least 99.9% identical 

6 thereto; 

7 b) a search means for comparing a target sequence to said 

8 nucleotide sequence of said data storage means of step (a) to identify a 

9 homologous sequence, and 

10 c) a retrieval means for obtaining said homologous sequence 

11 of step (b). 
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Nucleotide Sequence of Escherichia coli 
Pathogenicity Islands 

Background of the Invention 

Field of the Invention 

The present invention relates to novel genes located in two chromosomal 
regions within E. coli that are associated with virulence. These chromosomal 
regions are known as pathogenicity islands (PAIs). 

Related Background Art 

Escherichia coli (£. coli) is a normal inhabitant of the intestine of humans 
and various animals. Pathogenic E. coli strains are able to cause infections of the 
intestine (intestinal E. coli strains) and of other organs such as the urinary tract 
(uropathogenic E. coli) or the brain (extraintestinal E. coli). Intestinal pathogenic 
E. coli are a well established and leading cause of severe infantile diarrhea in the 
developing world. Additionally, cases of newborn meningitis and sepsis have 
been attributed to E. coli pathogens. 

In contrast to non-pathogenic isolates, pathogenic E. coli produce 
pathogenicity factors which contribute to the ability of strains to cause infectious 
diseases (Muhldorfer, I. and Hacker, J., Microb. Pathogen. 16:171-181 1994). 
Adhesions facilitate binding of pathogenic bacteria to host tissues. Pathogenic 
E. coli strains also express toxins including haemolysins, which are involved in 
the destruction of host cells, and surface structures such as O-antigens, capsules 
or membrane proteins, which protect the bacteria from the action of phagocytes 
or the complement system (Ritter, et al. Mol Microbiol 17:109-212 1995). 

The genes coding for pathogenicity factors of intestinal E. coli are located 
on large plasmids. phage genomes or on the chromosome. In contrast to intestinal 
E. colu pathogenicity determinants of uropathogenic and other extraintestinal £. 
coli are, in most cases, located on the chromosome. Id. 
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Large chromosomal regions in pathogenic bacteria that encode adjacently 
located virulence genes have been termed pathogenicity islands ("PAls"). PAIs 
are indicative of large fragments of DNA which comprise a group of virulence 
genes behaving as a distinct molecular and functional unit much like an island 
5 within the bacterial chromosome. For example, intact PAIs appear to transfer 

between organisms and confer complex virulence properties to the recipient 
bacteria. 

Chromosomal PAIs in bacterial cells have been described in increasing 
detail over recent years. For example, J. Hacker and co-workers described two 

10 large, unstable regions in the chromosome of uropathogenic Escherichia coli 

strain 536 as PAI-I and PAI-II (Hacker J., et al, Microbiol. Pathog. 8:213-25 
1990). Hacker found that PA1-1 and PAI-II containing virulence regions can be 
lost by spontaneous deletion due to recombination events. Both of these PAIs 
were found to encode multiple virulence genes, and their loss resulted in reduced 

15 hemolytic activity, serum resistance, mannose-resistant hemagglutination, 

uroepithelial cell binding, and mouse virulence of the E. coli. (Knapp, S et al, J. 
Bacterioi 168:22-30 1986). Therefore, pathogenicity islands are characterized 
by their ability to confer complex virulence phenotypes to bacterial cells. 

In addition to E. coli, specific deletion of large virulence regions has been 

20 observed in other bacteria such as Yersinia pestis. For example, Fetherston and 

co-workers found that a 102-kb region of the Y. pestis chromosome lost by 
spontaneous deletion resulted in the loss of many Y. pestis virulence phenotypes. 
(Fetherston, J.D. and Perry, R.D., Mol Microbiol 13:697-708 1994, Fetherston, 
et al y Mol Microbiol 6:2693-704 1992). In this instance, the deletion appeared 

25 to be due to recombination within 2.2-kb repetitive elements at both ends of the 

1 02-kb region. 

It is possible that deletion of PAIs may benefit the organism by 
modulating bacterial virulence or genome size during infection. PAIs may also 
represent foreign DNA segments that were acquired during bacterial evolution 
30 that conferred important pathogenic properties to the bacteria. Observed flanking 
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repeats, as observed in Y. pestis for example, may suggest a common mechanism 
by which these virulence genes were integrated into the bacterial chromosomes. 

Integration of the virulence genes into bacterial chromosomes was further 
elucidated by the discovery and characterization of a locus of enterocyte 
effacement (the LEE locus) in enteropathogenic E. coli (McDaniel, et al. 9 Proc. 
Natl Acad Sci. (USA) 92:1664-8 1995). The LEE locus comprises 35-kb and 
encodes many genes required for these bacteria to "invade" and degrade the apical 
structure of enerocytes causing diarrhea. Although the LEE and PAI-I loci 
encode different virulence genes, these elements are located at the exact same site 
in the E. coli genome and contain the same DNA sequence within their right-hand 
ends, thus suggesting a common mechanism for their insertion. 

Besides being found in enteropathogenic E. coli, the LEE element is also 
present in rabbit diarrheal E. coli, Hafnia alvei y and Citrobacter freundii biotype 
4280, all of which induce attaching and effacing lesions on the apical face of 
enterocytes. The LEE locus appears to be inserted in the bacterial chromosome 
as a discrete molecular and functional virulence unit in the same fashion as PALI, 
PAI-I1, and Yersinia PAL 

Along these same lines, a 40-kb Salmonella typhimurium PAI was 
characterized on the bacterial chromosome which encodes genes required for 
Salmonella entry into nonphagocytic epithelial cells of the intestine (Mills, D.M., 
et al, Mol Microbiol 15:749-59 1995). Like the LEE element, this PAI confers 
to Salmonella the ability to invade intestinal cells, and hence may likewise be 
characterized as an "invasion" PAL 

The pathogenicity islands described above all possess the common feature 
of conferring complex virulence properties to the recipient bacteria. However, 
they may be separated into two types by their respective contributions to 
virulence. PAI-I, PAI-IL and the Y. pestis PAI confer multiple virulence 
phenotypes, while the LEE and the S. typhimurium "invasion" PAI encode many 
genes specifying a single, complex virulence process. 

It is advantageous to characterize closely-related bacteria that contain or 
do not contain the PAI by the isolation of a discrete molecular and functional unit 



WO 98/22575 



PCT/US97/21347 



-4- 

on the bacterial chromosome. Since the presence versus the absence of essential 
virulence genes can often distinguish closely-related virulent versus avirulent 
bacterial strains or species, experiments have been conducted to identify virulence 
loci and potential PAls by isolating DNA sequences that are unique to virulent 
5 bacteria (Bloch, C.A., et a/., J Bacteriol. 176:7121-5 1994, Groisman, E.A., 

EMBOJ. 12:3779-87 1993). 

At least two PAIs are present in £. coli J96. These PAIs, PAI IV and PA1 
V are linked to tRNA loci but at sites different from those occupied by other 
known E. coli PAIs. Swenson et aL, Infect, and lmmun. 64:3736-3743 (1996). 
10 The era of true comparative genomics has been ushered in by high 

through-put genomic sequencing and analysis. The first two complete bacterial 
genome sequences, those of Haemophilus influenzae and Mycoplasma genitalium 
were recently described (Fleischmann, R.D., et aL, Science 269:496 (1995); 
Fraser, CM., et al, Science 270:191 (1995)). Large scale DNA sequencing 
15 efforts also have produced an extensive collection of sequence data from 

eukaryotes, including Homo sapiens (Adams, M.D., et al. 9 Nature 377:3 (1995)) 
and Saccharomyces cerevisiae (Levy, J., Yeast 70:1689 (1994)). 

The need continues to exist for the application of high through-put 
sequencing and analysis to study genomes and subgenomes of infectious 
20 organisms. Further, a need exists for genetic markers that can be employed to 

distinguish closely-related virulent and avirulent strains of a given bacteria. 

Summary of the Invention 

The present invention is based on the high through-put, random 
sequencing of cosmid clones covering two pathogenic islands (PAIs) of 
25 uropathogenic Escherichia coli strain J96 (04:K6; E. coli J96). PAIs are large 

fragments of DNA which comprise pathogenicity determinants. PAI IV is located 
approximately at 64 min (near pheV^) on the E. coli chromosome and is greater 
than 170 kilobases in size. PAI V is located at approximately 94 min (at pheR) 
on the E. coli chromosome and is approximately 106 kb in size. These PAIs 
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differ in location from the PAIs described by Hacker and colleagues for 
uropathogenic strain 536 (PA1 I, 82 minutes {selC} and PAI II, 97 minutes 
{leuX}). 

The location of the PAIs relative to one another and the cosmid clones 
covenng the J96 PAIs is shown in Figure 1. The present invention relates to the 
nucleotide sequences of 1 42 fragments of DN A (contigs) covering the PAI IV and 
PAI V regions of the E. coli J96 chromosome. The nucleotide sequences shown 
in SEQ ID NOs: 1 through 142 were obtained by shotgun sequencing eleven E. 
coli J96 subclones, which were deposited in two pools on September 23, 1996 at 
the American Type Culture Collection, 12301 Park Lawn Drive, Rockville, 
Maryland 20852, and given accession numbers 97726 (includes 7 cosmid clones 
covering PAI (IV) and 97727 (includes 4 cosmid clones covering PAI V). The 
deposited sets or "pools" of clones are more fully described in Example 1. In 
addition, E. coli strain J96 was also deposited at the American Type Culture 
Collection on September 23, 1996, and given accession number 98176. 

Three hundred fifty-one open reading frames have been thus far identified 
in the 142 contigs described by SEQ ID NOs: 1 through 142. Thus, the present 
invention is directed to isolated nucleic acid molecules comprising open reading 
frames (ORFs) encoding E. coli J96 PAI proteins, and fragments of said nucleic 
acid molecules. 

The present invention also relates to variants of the nucleic acid molecules 
of the present invention, which encode portions, analogs or derivatives of E. coli 
J96 PAI proteins. Further embodiments include isolated nucleic acid molecules 
comprising a polynucleotide having a nucleotide sequence at least 90% identical, 
and more preferably at least 95%, 96%, 97%, 98% or 99% identical, to the 

nucleotide sequence of an E. coli J96 PAI ORF described herein, and fragments 

of said nucleic acid molecules. 

The present invention also relates to recombinant vectors, which include 

the isolated nucleic acid molecules of the present invention and fragments 

thereof, host cells containing the recombinant vectors, as well as methods for 
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making such vectors and host cells for E. coli J96 PA1 protein production by 
recombinant techniques. 

The invention further provides isolated polypeptides encoded by the E. 
coli J96 PAI ORFs or fragments of said ORFs. It will be recognized that some 
5 amino acid sequences of the polypeptides described herein can be varied without 

significant effect on the structure or function of the protein. If such differences 
in sequence are contemplated, it should be remembered that there will be critical 
areas on the protein which determine activity. In general, it is possible to replace 
residues which form the tertiary structure, provided that residues performing a 
10 similar function are used. In other instances, the type of residue may be 

completely unimportant if the alteration occurs at a non-critical region of the 
protein. 

In another aspect, the invention provides a peptide or polypeptide 
comprising an epitope-bearing portion of a polypeptide of the invention. The 
1 5 epitope-bearing portion is an immunogenic or antigenic epitope useful for raising 

antibodies. 

The invention further provides a vaccine comprising one or more E. coli 
J96 PAI antigens together with a pharmaceutically acceptable diluent, carrier, or 
excipient, wherein the one or more antigens are present in an amount effective to 
20 elicit protective antibodies in an animal to pathogenic E. coli, such as strain J96. 

The invention also provides a method of eliciting a protective immune 
response in an animal comprising administering to the animal the above- 
described vaccine. 

The invention further provides a method for identifying pathogenic E. coli 
25 in an animal comprising analyzing tissue or body fluid from the animal for one 

or more of: 

(a) polynucleic acids encoding an open reading frame listed 
in Tables 1-4 or a fragment of said polynucleic acid; 

(b) full length or mature polypeptides encoded for by an open 
30 reading frame listed in Tables 1-4; or 
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(c) antibodies specific to polypeptides encoded for by an open 
reading frame listed in Tables 1-4. 

The invention further provides a nucleic acid probe for the detection of the 
presence of one or more E. coli PAI nucleic acids (nucleic acids encoding one or 
5 more ORFs as listed in Tables 1 -4) in a sample from an individual comprising 

one or more nucleic acid molecules sufficient to specifically detect under 
stringent hybridization conditions the presence of the above-described molecule 
in the sample. 

The invention also provides a method of detecting E. coli PAI nucleic 
10 acids in a sample comprising: 

a) contacting the sample with the above-described nucleic acid probe, 
under conditions such that hybridization occurs, and 

b) detecting the presence of the probe bound to an E. coli PAI nucleic 

acid. 

, 5 The invention further provides a kit for detecting the presence of one or 

more E. coli PAI nucleic acids in a sample comprising at least one container 
means having disposed therein the above-described nucleic acid probe. 

The invention also provides a diagnostic kit for detecting the presence of 
pathogenic E. coli in a sample comprising at least one container means having 
20 disposed therein one or more of the above-described antibodies. 

The invention also provides a diagnostic kit for detecting the presence of 
antibodies to pathogenic E. coli in a sample comprising at least one container 
means having disposed therein one or more of the above-described antigens. 

Brief Description of the Figures 



25 



Figure 1 is a schematic diagram of cosmid clones derived from E. coli 
J96 pathogenicity island and map positions of known E. coli PAIs (not drawn to 
scale). The gray bar represents the E. coli K-12 chromosome with minute 
demarcations of PAI junction points located above the bar. E. coli J96 
overlapping cosmid clones are represented by hatched bars (overlap not drawn to 
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scale) with positions of hly y pap, and prs operons indicated above bar. The PAIs 
and estimated sizes are shown above and below the K-12 chromosome map. 

Figure 2 is a block diagram of a computer system 102 that can be used to 
implement the computer-based systems of present invention. 

5 Detailed Description of the Invention 

The present invention is based on high through-put, random sequencing 
of a uropathogenic strain of Escherichia coli. The DNA sequences of contiguous 
DNA fragments covering the pathogenicity islands, PAI IV (also referred to as 
PAI J96(phcV) ) and PAI V (also referred to as PAI^^^) from the chromosome of 

10 the E. coli uropathogenic strain, J96 (04:K6) were determined. The sequences 

were used for DNA and protein sequence similarity searches of the database. 

The primary nucleotide sequences generated by shotgun sequencing 
cosmid clones of the PAI IV and PAI V regions of the E. coli chromosome are 
provided in SEQ ID NOs:l through 142. These sequences represent contiguous 

1 5 fragments of the PAI DNA. As used herein, the "primary sequence" refers to the 

nucleotide sequence represented by the IUPAC nomenclature system. The 
present invention provides the nucleotide sequences of SEQ ID NOs:l through 
142, or representative fragments thereof, in a form that can be readily used, 
analyzed, and interpreted by a skilled artisan. Within these 142 sequences, there 

20 have been thus far identified 351 open reading frames (ORFs) that are described 

in greater detail below. 

As used herein, a "representative fragment" refers to E. coli J96 PAI 
protein-encoding regions (also referred to herein as open reading frames or 
ORFs), expression modulating fragments, and fragments that can be used to 

25 diagnose the presence of E. coli in a sample. A non-limiting identification of 

such representative fragments is provided in Tables 1 through 6, preferably in 
Tables 1 through 4. As described in detail below, representative fragments of the 
present invention further include nucleic acid molecules having a nucleotide 
sequence at least 95% identical, preferably at least 96%, 97%, 98%, or 99% 
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identical, to an ORF identified in Tables 1 through 6, or more preferably Tables 
1 through 4. 

As indicated above, the nucleotide sequence information provided in SEQ 
ID NOs:l through 142 was obtained by sequencing cosmid clones covering the 
5 PAIs located on the chromosome of E. coli J96 using a megabase shotgun 

sequencing method. The sequences provided in SEQ ID NOs:l through 142 are 
highly accurate, although not necessarily a 100% perfect, representation of the 
nucleotide sequences of contiguous stretches of DNA (contigs) which include the 
ORFs located on the two pathogenicity islands of E. coli J96. As discussed in 
1 0 detail below, using the information provided in SEQ ID NOs: 1 through 1 42 and 

in Tables 1 through 6 together with routine cloning and sequencing methods, one 
of ordinary skill in the art would be able to clone and sequence all "representative 
fragments" of interest including open reading frames (ORFs) encoding a large 
variety of E. coli J96 PAI proteins. In rare instances, this may reveal a nucleotide 
1 5 sequence error present in the nucleotide sequences disclosed in SEQ ID NOs: 1 

through 142. Thus, once the present invention is made available (i.e., once the 
information in SEQ ID NOs: 1 through 142 and in Tables 1 through 6 is made 
available), resolving a rare sequencing error would be well within the skill of the 
art. Nucleotide sequence editing software is publicly available. For example, 
20 Applied Biosystem's (AB) AutoAssembler™ can be used as an aid during visual 

inspection of nucleotide sequences. 

Even if all of the rare sequencing errors were corrected, it is predicted that 
the resulting nucleotide sequences would still be at least about 99.9% identical 
to the reference nucleotide sequences in SEQ ID NOs: 1 through 142. Thus, the 
25 present invention further provides nucleotide sequences that are at least 99.9% 

identical to the nucleotide sequence of SEQ ID NOs: 1 through 142 in a form 
which can be readily used, analyzed and interpreted by the skilled artisan. 
Methods for determining whether a nucleotide sequence is at least 99.9% 
identical to a reference nucleotide sequence of the present invention are described 
30 below. 
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Nucleic Acid Molecules 

The present invention is directed to isolated nucleic acid fragments of the 
PAIs of E. coli J96. Such fragments include, but are not limited to, nucleic acid 
molecules encoding polypeptides, nucleic acid molecules that modulate the 
expression of an operably linked ORF (hereinafter expression modulating 
fragments (EMFs)), and nucleic acid molecules that can be used to diagnose the 
presence of E. coli in a sample (hereinafter diagnostic fragments (DFs)). 

By "isolated nucleic acid molecule(sy is intended a nucleic acid 
molecule, DN A or RN A, that has been removed from its native environment. For 
example, recombinant DNA molecules contained in a vector are considered 
isolated for the purposes of the present invention. Further examples of isolated 
DNA molecules include recombinant DNA molecules maintained in heterologous 
host cells, purified (partially or substantially) DNA molecules in solution, and 
nucleic acid molecules produced synthetically. Isolated RNA molecules include 
in vitro RNA transcripts of the DNA molecules of the present invention. 

In one embodiment, £. coli J96 PAI DNA can be mechanically sheared 
to produce fragments about 1 5-20 kb in length, which can be used to generate an 
E. coli J96 PAI DNA library by insertion into lambda clones as described in 
Example 1 below. Primers flanking an ORF described in Tables 1 through 6 can 
then be generated using the nucleotide sequence information provided in SEQ ID 
NOs: 1 through 142. The polymerase chain reaction (PGR) is then used to 
amplify and isolate the ORF from the lambda DNA library. PCR cloning is well 
known in the art. Thus, given SEQ ID NOs: 1 through 142, and Tables 1 through 
6, it would be routine to isolate any ORF or other representative fragment of the 
E. coli J96 PAI subgenomes. Isolated nucleic acid molecules of the present 
invention include, but are not limited to. single stranded and double stranded 
DNA, and single stranded RNA. and complements thereof. 

Tables 1 through 6 herein describe ORFs in the E. coli J96 PAI cosmid 
clone library. 
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Tables 1 and 3 list, for PAI IV and PA1 V, respectively, a number of ORFs 
that putatively encode a recited protein based on homology matching with protein 
sequences from an organism listed in the Table. Tables 1 and 3 indicate the 
location of ORFs (i.e., the position) by reference to its position within the one of 
5 the 142 E. coli J96 contigs described in SEQ ID NOs: 1 through 142. Column 1 

of Tables 1 and 3 provides the Sequence ID Number (SEQ ID NO) of the contig 
in which a particular open reading frame is located. Column 2 numerically 
identifies a particular ORF on a particular contig (SEQ ID NO) since many 
contigs comprise a plurality of ORFs. Columns 3 and 4 indicate an ORF's 
10 position in the nucleotide sequence (contig) provided in SEQ ID NOs: 1 through 

142 by referring to start and stop positions in the contig sequence. 

One of ordinary skill in the art will appreciate that the ORFs may be 
oriented in opposite directions in the E. coli chromosome. This is reflected in 
columns 3 and 4. For these ORFs, the sense strand is complementary to the 
1 5 actual sequence given. The corresponding sense-strand of the ORF must be read 

as the 5'-3' complement of the antisense strand actually shown in the Sequence 
Listing, wherein the location is specified 3'-5\ 

Column 5 provides a database accession number to a homologous protein 
identified by a similarity search of public sequence databases (see, infra). 
20 Column 6 describes the matching protein sequence and the source organism is 

identified in brackets. Column 7 of Tables 1 and 3 indicates the percent similarity 
of the protein sequence encoded by an ORF to the corresponding protein 
sequence from the organism appearing in parentheses in the sixth column. 
Column 8 of Tables 1 and 3 indicates the percent identity of the protein sequence 
25 encoded by an ORF to the corresponding protein sequence from the organism 

appearing in parentheses in the sixth column. The concepts of percent identity 
and percent similarity of two polypeptide sequences are well understood in the art 
and are described in more detail below. Identified genes can frequently be 
assigned a putative cellular role category adapted from Riley (see, Riley, M., 
30 Microbiol Rev. 57:862 (1993)). Column 9 of Tables 1 and 3 provides the 

nucleotide length of the open reading frame. 
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Tables 2 and 4. below, provide ORFs of E. coli J96 PAI IV and PAI V, 
respectively, that did not elicit a homology match with a known sequence from 
either E. coli or another organism. As above, the first column in Tables 2 and 4 
provides the contig in which the ORF is located and the second column 
numerically identifies a particular ORF in a particular contig. Columns 3 and 4 
identify an ORF's position in one of SEQ ID NOs: 1 through 142 by reference to 
start and stop nucleotides. 

Tables 5 and 6, below, provide the E. coli J96 PAI IV ORFs and PAI V 
ORFs, respectively, identified by the present inventors that provided a significant 
match to a previously published E. coli protein. Columns 1-6 correspond to 
columns 1 -6 appearing in Tables 1 and 3. Column 7 indicates the percent identity 
of the protein sequence encoded by an ORF to the corresponding protein 
sequence from the organism appearing in parentheses in the sixth column. 
Column 8 indicates the length of the high-scoring segment pair (HSP). Column 
9 provides the nucleotide length of the open reading frame. 

As used herein, "open reading frame" or "ORF" refers to the nucleotide 
sequences as described in Tables 1 through 6. In Tables 1 through 6, each ORF 
is designated by a nucleotide sequence start position and stop position according 
to numbering of contig nucleotides in the Sequence Listing provided (Contig ID 
= SEQ ID NO). 

In a first embodiment, the invention comprises a nucleotide sequence 
described in Tables 1 through 4 which begins with the nucleotide following the 
last nucleotide of an upstream stop codon (first nucleotide of the "ORF"), an 
initiation codon, in-frame putative polypepti de-encoding sequence, and 
nucleotides of an in-frame stop codon. 

In a second embodiment, the invention comprises a nucleotide sequence 
of Tables 1 through 4 which contains an initiation codon (e.g. a methionine or 
valine codon) on their 5' end and a stop codon on their 3 1 end. The sequences of 
this embodiment are present within the nucleotide sequence described in Tables 
1 through 4 by start and stop position as numbered in the Sequence Listing. To 
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detcrmine the 5' start position of this embodiment, one simply reads 5' to 3' from 
the designated 5' end position until an initiation codon is found- 
In a third embodiment, the invention comprises a nucleotide sequence of 
the second embodiment, except that the 3' stop codon is not present. 

In a fourth embodiment, the invention comprises a nucleotide sequence 
encoding a putative protein which is a sequence of Tables 1 through 4 excluding 
sequence encoding amino acids subject to removal by post-translational 
processing and sequences 3' of the last codon coding for an amino acid present 
in the putative polypeptide (e.g., sequences not containing the stop codon and 
encoding the mature form of the polypeptide). 

Certain embodiments of the invention may therefore either include or 
exclude initiation codons for methionine or valine and either include or exclude 
the stop codon. 

Further details concerning the algorithms and criteria used for homology 
searches are provided in the Examples below. A skilled artisan can readily 
identify ORFs in the Escherichia coli J96 cosmid library other than those listed 
in Tables 1 through 6, such as ORFs that are overlapping or encoded by the 
opposite strand of an identified ORF in addition to those ascertainable using the 
computer-based systems of the present invention. 

Isolated nucleic acid molecules of the present invention include DNA 
molecules having a nucleotide sequence substantially different than the nucleotide 
sequence of an ORF described in Tables 1 through 4, but which, due to the 
degeneracy of the genetic code, still encode a E. coli J96 PAI protein. The 
genetic code is well known in the art. Thus, it would be routine to generate such 
degenerate variants. 

The present invention further relates to variants of the nucleic acid 
molecules of the present invention, which encode portions, analogs or derivatives 
of an E. coli protein encoded by an ORF described in Table 1 through 4. 
Non-naturally occurring variants may be produced using art-known mutagenesis 
techniques and include those produced by nucleotide substitutions, deletions or 
additions. The substitutions, deletions or additions may involve one or more 
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nucleotides. The variants may be altered in coding regions, non-coding regions, 
or both. Alterations in the coding regions may produce conservative or 
non-conservative amino acid substitutions, deletions or additions. Especially 
preferred among these are silent substitutions, additions and deletions, which do 
not alter the properties and activities of the E. coli protein or portions thereof. 
Also especially preferred in this regard are conservative substitutions. 

Further embodiments of the invention include isolated nucleic acid 
molecules comprising a polynucleotide having a nucleotide sequence at least 90% 
identical, and more preferably at least 95%, 96%, 97%, 98% or 99% identical, to 
the nucleotide sequence of an ORF described in Tables 1 through 6, preferably 
1 through 4. By a polynucleotide having a nucleotide sequence at least, for 
example, 95% identical to the reference E. coli ORF nucleotide sequence is 
intended that the nucleotide sequence of the polynucleotide is identical to the 
reference sequence except that the polynucleotide sequence may include up to 
five point mutations per each 100 nucleotides of the ORF sequence. In other 
words, to obtain a polynucleotide having a nucleotide sequence at least 95% 
identical to a reference ORF nucleotide sequence, up to 5% of the nucleotides in 
the reference sequence may be deleted or substituted with another nucleotide, or 
a number of nucleotides up to 5% of the total nucleotides in the reference 
sequence may be inserted into the reference sequence. These mutations of the 
reference sequence may occur at the 5 ' or 3 ' terminal positions of the reference 
nucleotide sequence or anywhere between those terminal positions, interspersed 
either individually among nucleotides in the reference sequence or in one or more 
contiguous groups within the reference sequence. 

As a practical matter, whether any particular nucleic acid molecule is at 
least 90%>, 95%, 96%, 97%, 98% or 99% identical to the nucleotide sequence of 
an E. coli J96 PAI ORF can be determined conventionally using known computer 
programs such as the Bestfit program (Wisconsin Sequence Analysis Package, 
Version 8 for Unix, Genetics Computer Group, University Research Park, 575 
Science Drive, Madison, Wl 5371 1). Bestfit uses the local homology algorithm 
of Smith and Waterman, Advances in Applied Mathematics 2: 482-489 (1981), 
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to find the best segment of homology between two sequences. When using 
Bestfit or any other sequence alignment program to determine whether a 
particular sequence is, for instance, 95% identical to a reference sequence 
according to the present invention, the parameters are set, of course, such that the 
5 percentage of identity is calculated over the full length of the reference nucleotide 

sequence and that gaps in homology of up to 5% of the total number of 
nucleotides in the reference sequence are allowed. 

Preferred are nucleic acid molecules having sequences at least 90%, 95%, 
96%, 97%, 98% or 99% identical to the nucleic acid sequence of an E. coli J96 
10 PAI ORF that encode a functional polypeptide. By a "functional polypeptide" is 

intended a polypeptide exhibiting activity similar, but not necessarily identical, 
to an activity of the protein encoded by the E. coli J96 PAI ORF. For example, 
the E. coli ORF [Contig ID 84, ORF ID 3 (84/3)] encodes a hemolysin. Thus, a 
"functional polypeptide" encoded by a nucleic acid molecule having a nucleotide 
15 sequence, for example, 95% identical to the nucleotide sequence of 84/3, will also 

possess hemolytic activity. As the skilled artisan will appreciate, assays for 
determining whether a particular polypeptide is "functional" will depend on 
which ORF is used as the reference sequence. Depending on the reference ORF, 
the assay chosen for measuring polypeptide activity will be readily apparent in 
20 light of the role categories provided in Tables 1, 3, 5 and 6. 

Of course, due to the degeneracy of the genetic code, one of ordinary skill 
in the art will immediately recognize that a large number of the nucleic acid 
molecules having a sequence at least 90%, 95%, 96%, 97%, 98%, or 99% 
identical to the nucleic acid sequence of a reference ORF will encode a functional 
25 polypeptide. In fact, since degenerate variants all encode the same amino acid 

sequence, this will be clear to the skilled artisan even without performing a 
comparison assay for protein activity. It will be further recognized in the art that, 
for such nucleic acid molecules that are not degenerate variants, a reasonable 
number will also encode a functional polypeptide. This is because the skilled 
30 artisan is fully aware of amino acid substitutions that are either less likely or not 
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likely to significantly affect protein function (e.g., replacing one aliphatic amino 
acid with a second aliphatic amino acid). 

For example, guidance concerning how to make phenotypically silent 
amino acid substitutions is provided in Bowie, J. U. et aL, "Deciphering the 
Message in Protein Sequences: Tolerance to Amino Acid Substitutions," Science 
24 7. 1306-1310 (1990), wherein the authors indicate that there are two main 
approaches for studying the tolerance of an amino acid sequence to change. The 
first method relies on the process of evolution, in which mutations are either 
accepted or rejected by natural selection. The second approach uses genetic 
engineering to introduce amino acid changes at specific positions of a cloned gene 
and selections or screens to identify sequences that maintain functionality. As the 
authors state, these studies have revealed that proteins are surprisingly tolerant of 
amino acid substitutions. The authors further indicate which amino acid changes 
are likely to be permissive at a certain position of the protein. For example, most 
buried amino acid residues require nonpolar side chains, whereas few features of 
surface side chains are generally conserved. Other such phenotypically silent 
substitutions are described in Bowie, J.U. et al. y supra, and the references cited 
therein. 

The present invention is further directed to fragments of the isolated 
nucleic acid molecules described herein. By a fragment of an isolated nucleic 
acid molecule having the nucleotide sequence of an E. coli J96 PAI ORF is 
intended fragments at least about 15 nt, and more preferably at least about 20 nt, 
still more preferably at least about 30 nt, and even more preferably, at least about 
40 nt in length that are useful as diagnostic probes and primers as discussed 
herein. Of course, larger fragments 50-500 nt in length are also useful according 
to the present invention as are fragments corresponding to most, if not all, of the 
nucleotide sequence of an E. coli J96 PAI ORF. By a fragment at least 20 nt in 
length, for example, is intended fragments that include 20 or more contiguous 
bases from the nucleotide sequence of an E. coli J96 PAI ORF. Since E. coli 
ORFs are listed in Tables 1 through 6 and the sequences of the ORFs have been 
provided within the contig sequences of SEQ ID NOs: 1 through 142, generating 
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such DNA fragments would be routine to the skilled artisan. For example, 
restriction endonuclease cleavage or shearing by sonication could easily be used 
to generate fragments of various sizes from the PAJ DNA that is incorporated into 
the deposited pools of cosmid clones. Alternatively, such fragments could be 
generated synthetically. 

Preferred nucleic acid fragments of the present invention include nucleic 

acid molecules encoding epitope-bearing portions of an E. coli J96 PA1 protein. 

Methods for determining such epitope-bearing portions are described in detail 

below. 

In another aspect, the invention provides an isolated nucleic acid molecule 
comprising a polynucleotide that hybridizes under stringent hybridization 
conditions to a portion of the polynucleotide in a nucleic acid molecule of the 
invention described above, for instance, an ORF described in Tables 1 through 
6, preferably an ORF described in Tables 1 , 2, 3 or 4. By "stringent hybridization 
conditions" is intended overnight incubation at 42°C in a solution comprising: 
50% formamide, 5 x SSC (150 raM NaCl, 15mM trisodium citrate), 50 mM 
sodium phosphate (pH 7.6), 5 x Denhardt's solution, 10% dextran sulfate, and 20 
g/ml denatured, sheared salmon sperm DNA, followed by washing the filters in 

0.1 x SSC at about 65 °C. 

By a polynucleotide that hybridizes to a "portion" of a polynucleotide is 
intended a polynucleotide (either DNA or RNA) hybridizing to at least about 15 
nucleotides (nt), and more preferably at least about 20 nt, still more preferably at 
least about 30 nt, and even more preferably about 30-70 nt of the reference 
polynucleotide. These are useful as diagnostic probes and primers as discussed 
above and in more detail below. 

Of course, polynucleotides hybridizing to a larger portion of the reference 
polynucleotide (e.g., a E. coli ORF), for instance, a portion 50-500 nt in length, 
or even to the entire length of the reference polynucleotide, are also useful as 
probes according to the present invention, as are polynucleotides corresponding 
to most, if not all. of an E. coli J96 PAI ORF. 
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By "expression modulating fragment" (EMF), is intended a series of 
nucleotides that modulate the expression of an operably linked, putative 
polypeptide-encoding region (encoding region). A sequence is said to "modulate 
the expression of an operably linked sequence" when the expression of the 
sequence is altered by the presence of the EMF. EMFs include, but are not 
limited to, promoters, and promoter modulating sequences (inducible elements). 
One class of EMFs are fragments that induce the expression of an operably linked 
encoding region in response to a specific regulatory factor or physiological event. 
EMF sequences can be identified within the E. coli genome by their proximity to 
the encoding regions within ORFs described in Tables 1 through 6. An intergenic 
segment, or a fragment of the intergenic segment, from about 10 to 200 
nucleotides in length, taken 5' from any one of the encoding regions of ORFs of 
Tables 1 through 6 will modulate the expression of an operably linked 3' 
encoding region in a fashion similar to that found within the naturally linked ORF 
sequence. As used herein, an "intergenic segment" refers to the fragments of the 
E. coli J96 PAI subgenome that are between two encoding regions herein 
described. Alternatively, EMFs can be identified using known EMFs as a target 
sequence or target motif in the computer-based systems of the present invention. 

The presence and activity of an EMF can be confirmed using an EMF trap 
vector. An EMF trap vector contains a cloning site 5' to a marker sequence. A 
marker sequence encodes an identifiable phenotype, such as antibiotic resistance 
or a complementing nutrition auxotrophic factor, which can be identified or 
assayed when the EMF trap vector is placed within an appropriate host under 
appropriate conditions. As described above, an EMF will modulate the 
expression of an operably linked marker sequence. A more detailed discussion 
of various marker sequences is provided below. 

A sequence that is suspected as being an EMF is cloned in all three 
reading frames in one or more restriction sites upstream from the marker 
sequence in the EMF trap vector. The vector is then transformed into an 
appropriate host using known procedures and the phenotype of the transformed 
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host in examined under appropnate conditions. As descnbed above, an EMF will 
modulate the expression of an operably linked marker sequence. 

By a "diagnostic fragment" (DF), is intended a series of nucleotides that 
selectively hybridize to E. coli sequences. DFs can be readily identified by 
identifying unique sequences within the E. coli J96 PAI subgenome, or by- 
generating and testing probes or amplification primers consisting of the DF 
sequence in an appropriate diagnostic format for amplification or hybridization 
selectivity. 

Each of the ORFs of the E. coli J96 PAI subgenome disclosed in Tables 
1 through 4, and EMFs found 5 ' to the encoding regions of the ORFs, can be used 
in numerous ways as polynucleotide reagents. The sequences can be used as 
diagnostic probes or diagnostic amplification primers to detect the presence of 
uropathogenic E. coli in a sample. This is especially the case with the fragments 
or ORFs of Table 2 and 4 which will be highly selective for uropathogenic E. coli 
J96, and perhaps other uropathogenic or extraintestinal strains that include one 
or more PAls. 

In addition, the fragments of the present invention, as broadly described, 
can be used to control gene expression through triple helix formation or antisense 
DNA or RNA, both of which methods are based on the binding of a 
polynucleotide sequence to DNA or RNA. Polynucleotides suitable for use in 
these methods are usually 20 to 40 bases in length and are designed to be 
complementary to a region of the gene involved in transcription (triple helix - see 
Lee et al.,Nucl. Acids Res. <5:3073 (1979); Cooney et al, Science 241:456 (1988); 
and Dervan et al., Science 257:1360 (1991)) or to the mRNA itself (antisense - 
Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense 
Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)). 

Triple helix- formation optimally results in a shut-off of RNA 
transcription from DNA, while antisense RNA hybridization blocks translation 
of an mRNA molecule into polypeptide. Both techniques have been 
demonstrated to be effective in model systems. Information contained in the 
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scquences of the present invention is necessary for the design of an antisense or 
triple helix oligonucleotide. 

Vectors and Host Cells 

The present invention further provides recombinant constructs comprising 

5 one or more fragments of the E. coli J96 PAIs. The recombinant constructs of the 

present invention comprise a vector, such as a plasmid or viral vector, into which, 
for example, an E. coli J96 PAI ORF is inserted. The vector may further 
comprise regulatory sequences, including for example, a promoter, operably 
linked to the encoding region of an ORF. For vectors comprising the EMFs of 

10 the present invention, the vector may further comprise a marker sequence or 

heterologous ORF operably linked to the EMF. Large numbers of suitable 
vectors and promoters are known to those of skill in the art and are commercially 
available for generating the recombinant constructs of the present invention. The 
following vectors are provided by way of example. Bacterial: pBs, phagescript, 

15 PsiX174, pBluescript SK, pBs KS, pNH8a, pNH16a, pNH18a, pNH46a 

(Stratagene); P Trc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia). 
Eukaryotic: pWLneo, pSV2cat, pOG44, pXTl, pSG (Stratagene) pSVK3, pBPV, 
pMSG, pSVL (Pharmacia). 

Promoter regions can be selected from any desired gene using CAT 

20 (chloramphenicol transferase) vectors or other vectors with selectable markers. 

Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial 
promoters include lad, lacZ, T3, T7, gpt, lambda P R , and trc. Eukaryotic 
promoters include CMV immediate early, HSV thymidine kinase, early and late 
SV40, LTRs from retrovirus, and mouse metallothionein-I. Selection of the 

25 appropriate vector and promoter is well within the level of ordinary skill in the 

art. 

The present invention farther provides host cells containing any one of the 
isolated fragments (preferably an ORF) of the E. coli J96 PAIs described herein. 
The host cell can be a higher eukaryotic host cell, such as a mammalian cell, a 
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lower cukaryotic host cell, such as a yeast cell, or the host cell can be a 
procaryotic cell, such as a bacterial cell. Introduction of the recombinant 
construct into the host cell can be effected by calcium phosphate transfection, 
DEAE, dextran mediated transfection, or electroporation (Davis, L. et al, Basic 
Methods in Molecular Biology (1986)). Host cells containing, for example, an 
E. coli J96 PAI ORF can be used conventionally to produce the encoded protein. 

Polypeptides and Fragments 

The invention further provides isolated polypeptides having the amino 
acid sequence encoded by an E coli PAI ORF described in Tables 1 through 6, 
preferably Tables 1 through 4, or a peptide or polypeptide comprising a portion 
of the above polypeptides. The terms "peptide" and "oligopeptide" are considered 
synonymous (as is commonly recognized) and each term can be used 
interchangeably as the context requires to indicate a chain of at least two amino 
acids coupled by peptidyl linkages. The word "polypeptide" is used herein for 
chains containing more than ten amino acid residues. All oligopeptide and 
polypeptide formulas or sequences herein are written from left to right and in the 
direction from amino terminus to carboxy terminus. 

It will be recognized in the art that some amino acid sequences of E. coli 
polypeptides can be varied without significant effect of the structure or function 
of the protein. If such differences in sequence are contemplated, it should be 
remembered that there will be critical areas on the protein which determine 
activity. In general, it is possible to replace residues which form the tertiary 
structure, provided that residues performing a similar function are used. In other 
instances, the type of residue may be completely unimportant if the alteration 
occurs at a non-critical region of the protein. 

Thus, the invention further includes variations of polypeptides encoded 
for by ORFs listed in Tables 1 through 6 which show substantial pathogenic 
activity or which include regions of particular E. coli PAI proteins such as the 
protein portions discussed below. Such mutants include deletions, insertions, 
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inversions, repeats, and type substitutions (for example, substituting one 
hydrophilic residue for another, but not strongly hydrophilic for strongly 
hydrophobic as a rule). Small changes or such "neutral" amino acid substitutions 
will generally have little effect on activity. 
5 Typically seen as conservative substitutions are the replacements, one for 

another, among the aliphatic amino acids Ala, Val, Leu and He; interchange of the 
hydroxyl residues Ser and Thr, exchange of the acidic residues Asp and Glu, 
substitution between the amide residues Asn and Gin, exchange of the basic 
residues Lys and Arg and replacements among the aromatic residues Phe, Tyr. 

10 As indicated in detail above, further guidance concerning which amino 

acid changes are likely to be phenotypically silent (i.e., are not likely to have a 
significant deleterious effect on a function) can be found in Bowie, J.U., et aL, 
"Deciphering the Message in Protein Sequences: Tolerance to Amino Acid 
Substitutions," Science 247:1306-1310 (1990). 

1 5 Thus, the fragment, derivative or analog of a polypeptide encoded by an 

ORF described in one of Tables 1 through 6, may be (i) one in which one or more 
of the amino acid residues are substituted with a conserved or non-conserved 
amino acid residue (preferably a conserved amino acid residue) and such 
substituted amino acid residue may or may not be one encoded by the genetic 

20 code, or (ii) one in which one or more of the amino acid residues includes a 

substituent group, or (iii) one in which the mature polypeptide is fused with 
another compound, such as a compound to increase the half-life of the 
polypeptide (for example, polyethylene glycol), or (iv) one in which the 
additional amino acids are fused to the mature polypeptide, such as an IgG Fc 

25 fusion region peptide or leader or secretory sequence or a sequence which is 

employed for purification of the mature polypeptide or a proprotein sequence. 
Such fragments, derivatives and analogs are deemed to be within the scope of 
those skilled in the art from the teachings herein. 

Of particular interest are substitutions of charged amino acids with 

30 another charged amino acid and with neutral or negatively charged amino acids. 

The latter results in proteins with reduced positive charge to improve the 
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15 



characteristics of sa.d proteins. The prevention of aggregation is highly desirable. 
Aggregation of proteins not only results in a loss of activity but can also be 
problematic when preparing pharmaceutical formulations, because they can be 
unmunogenic. (Pinckard et al, Clin Exp. Immunol. 2:331-340 (1967); Robbins 
et al. Diabetes 36:838-845 (1987); Cleland et al. Crit. Rev. Therapeutic Drug 
Carrier Systems 70:307-377 (1993)). 

The replacement of amino acids can also change the selectivity of binding 
to cell surface receptors. Ostade et al, Nature 367:266-268 (1993) describes 
certain mutations resulting in selective binding of TNF-cc to only one of the two 
known types of TNF receptors. Thus, proteins encoded for by the ORFs listed in 
Tables 1, 2, 3, 4, 5, or 6. and that bind to a cell surface receptor, may include one 
or more amino acid substitutions, deletions or additions, either from natural 
mutations or human manipulation. 

As indicated, changes are preferably of a minor nature, such as 
conservative amino acid substitutions that do not significantly affect the folding 
or activity of the protein (see Table 7). 



TABLE 7. Conservative Amino Acid Substitutions 



20 



Aromatic 


Phenylalanine 

Tryptophan 

Tyrosine 


Hydrophobic 


Leucine 

Isoleucine 

Valine 


Polar 


Glutamine 
Asparagine 


Basic 


Arginine 

Lysine 

Histidine 


Acidic 


Aspartic Acid 
Glutamic Acid 


Small 




Alanine 

Serine 

Threonine 

Methionine 

Glycine 
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Amino acids in the proteins encoded by ORFs of the present invention 
that are essential for function can be identified by methods known in the art, such 
as site-directed mutagenesis or alanine-scanning mutagenesis (Cunningham and 
Wells, Science 244: 108 1-1 085 (1989)). The latter procedure introduces single 
5 alanine mutations at every residue in the molecule. The resulting mutant 

molecules are then tested for biological activity such as receptor binding or in 
vitro, or in vitro proliferative activity. Sites that are critical for ligand-receptor 
binding can also be determined by structural analysis such as crystallization, 
nuclear magnetic resonance or photoaffinity labeling (Smith et aL, J. Mol. Biol. 

10 224:899-904 (1992) and de Vos et aL Science 255:306-312 (1992)). 

The polypeptides of the present invention are preferably provided in an 
isolated form, and preferably are substantially purified. A recombinantly 
produced version of the polypeptides can be substantially purified by the one-step 
method described in Smith and Johnson, Gene (57. 31-40 (1988). 

1 5 The polypeptides of the present invention include the polypeptide encoded 

by the ORFs listed in Tables 1-6, preferably Tables 1-4, as well as polypeptides 
which have at least 90% similarity, more preferably at least 95% similarity, and 
still more preferably at least 96%, 97%, 98% or 99% similarity to those described 
above, and also include portions of such polypeptides with at least 30 amino acids 

20 and more preferably at least 50 amino acids. 

By "% similarity" for two polypeptides is intended a similarity score 
produced by comparing the amino acid sequences of the two polypeptides using 
the Bestfit program (Wisconsin Sequence Analysis Package, Version 8 for Unix, 
Genetics Computer Group, University Research Park, 575 Science Drive, 

25 Madison, Wl 5371 1) and the default settings for determining similarity. Bestfit 

uses the local homology algorithm of Smith and Waterman (Advances in Applied 
Mathematics 2:482-489. 1 981 ) to find the best segment of similarity between two 
sequences. 

By a polypeptide having an amino acid sequence at least, for example, 
30 95% "identical" to a reference amino acid sequence of a polypeptide is intended 
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that the amino acid sequence of the polypeptide is identical to the reference 
sequence except that the polypeptide sequence may include up to five amino acid 
alterations per each 100 amino acids of the reference amino acid of said 
polypeptide. In other words, to obtain a polypeptide having an amino acid 
sequence at least 95% identical to a reference amino acid sequence, up to 5% of 
the amino acid residues in the reference sequence may be deleted or substituted 
with another amino acid, or a number of amino acids up to 5% of the total amino 
acid residues in the reference sequence may be inserted into the reference 
sequence. These alterations of the reference sequence may occur at the amino or 
carboxy terminal positions of the reference amino acid sequence or anywhere 
between those terminal positions, interspersed either individually among residues 
in the reference sequence or in one or more contiguous groups within the 
reference sequence. 

As a practical matter, whether any particular polypeptide is at least 90%, 
95%, 96%, 97%, 98% or 99% identical to, for instance, the amino acid sequence 
encoded by the ORFs listed in Tables 1, 2, 3, 4, 5, or 6 can be determined 
conventionally using known computer programs such the Bestfit program 
(Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer 
Group, University Research Park, 575 Science Drive, Madison, Wl 5371 1 . When 
using Bestfit or any other sequence alignment program to determine whether a 
particular sequence is, for instance, 95% identical to a reference sequence 
according to the present invention, the parameters are set, of course, such that the 
percentage of identity is calculated over the fUll length of the reference amino 
acid sequence and that gaps in homology of up to 5% of the total number of 
amino acid residues in the reference sequence are allowed. 

The polypeptide of the present invention could be used as a molecular 
weight marker on SDS-PAGE gels or on molecular sieve gel filtration columns 
using methods well known to those of skill in the art. 

As described in detail below, the polypeptides of the present invention can 
also be used to raise polyclonal and monoclonal antibodies, which are useful in 
assays for detecting pathogenic protein expression as described below or as 
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agonists and antagonists capable of enhancing or inhibiting protein function of 
important proteins encoded by the ORFs of the present invention. Further, such 
polypeptides can be used in the yeast two-hybrid system to "capture" protein 
binding proteins which are also candidate agonist and antagonist according to the 
5 present invention. The yeast two hybrid system is described in Fields and Song, 

Nature 340:245-246 (1989). 

In another aspect, the invention provides a peptide or polypeptide 
comprising an epitope-bearing portion of a polypeptide of the invention. The 
epitope of this polypeptide portion is an immunogenic or antigenic epitope of a 

10 polypeptide of the invention. An "immunogenic epitope" is defined as a part of 

a protein that elicits an antibody response when the whole protein is the 
immunogen. These immunogenic epitopes are believed to be confined to a few 
loci on the molecule. On the other hand, a region of a protein molecule to which 
an antibody can bind is defined as an "antigenic epitope." The number of 

1 5 immunogenic epitopes of a protein generally is less than the number of antigenic 

epitopes. See, for instance, Geysen et al., Proc. Natl. Acad. Sci. USA 57:3998- 
4002 (1983). 

As to the selection of peptides or polypeptides bearing an antigenic 
epitope (i.e., that contain a region of a protein molecule to which an antibody can 

20 bind), it is well known in that art that relatively short synthetic peptides that 

mimic part of a protein sequence are routinely capable of eliciting an antiserum 
that reacts with the partially mimicked protein. See, for instance, Sutcliffe, J. G., 
Shinnick, T. M., Green, N. and Learner, R.A. (1983) Antibodies that react with 
predetermined sites on proteins. Science 2/9:660-666. Peptides capable of 

25 eliciting protein-reactive sera are frequently represented in the primary sequence 

of a protein, can be characterized by a set of simple chemical rules, and are 
confined neither to immunodominant regions of intact proteins (i.e., 
immunogenic epitopes) nor to the amino or carboxyl terminals. Peptides that are 
extremely hydrophobic and those of six or fewer residues generally are ineffective 

30 at inducing antibodies that bind to the mimicked protein; longer, peptides, 

especially those containing proline residues, usually are effective. Sutcliffe et al., 
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supra. at 661. For instance, 18 of 20 peptides designed according to these 
guidelines, containing 8-39 residues covering 75% of the sequence of the 
influenza virus hemagglutinin HA1 polypeptide chain, induced antibodies that 
reacted with the MAI protein or intact virus; and 12/12 peptides from the MuLV 
5 polymerase and 18/18 from the rabies glycoprotein induced antibodies that 

precipitated the respective proteins. 

Antigenic epitope-bearing peptides and polypeptides of the invention are 
therefore useful to raise antibodies, including monoclonal antibodies, that bind 
specifically to a polypeptide of the invention. Thus, a high proportion of 
10 hybridomas obtained by fusion of spleen cells from donors immunized with an 

antigen epitope-bearing peptide generally secrete antibody reactive with the 
native protein. Sutcliffe et al, supra, at 663. The antibodies raised by antigenic 
epitope-bearing peptides or polypeptides are useful to detect the mimicked 
protein, and antibodies to different peptides may be used for tracking the fate of 
15 various regions of a protein precursor which undergoes post-translational 

processing. The peptides and anti-peptide antibodies may be used in a variety of 
qualitative or quantitative assays for the mimicked protein, for instance in 
competition assays since it has been shown that even short peptides (e.g., about 
9 amino acids) can bind and displace the larger peptides in immunoprecipitation 
20 assays. See, for instance, Wilson et aL, Cell 37:767-778 (1984) at 777. The anti- 

peptide antibodies of the invention also are useful for purification of the 
mimicked protein, for instance, by adsorption chromatography using methods 
well known in the art. 

Antigenic epitope-bearing peptides and polypeptides of the invention 
25 designed according to the above guidelines preferably contain a sequence of at 

least seven, more preferably at least nine and most preferably between about 15 
to about 30 amino acids contained within the amino acid sequence of a 
polypeptide of the invention. However, peptides or polypeptides comprising a 
larger portion of an amino acid sequence of a polypeptide of the invention, 
30 containing about 30 to about 50 amino acids, or any length up to and including 

the entire amino acid sequence of a polypeptide of the invention, also are 
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considered cpitope-bearing peptides or polypeptides of the invention and also are 
useful for inducing antibodies that react with the mimicked protein. Preferably, 
the amino acid sequence of the epitope-bearing peptide is selected to provide 
substantial solubility in aqueous solvents (i.e., the sequence includes relatively 
5 hydrophilic residues and highly hydrophobic sequences are preferably avoided); 

and sequences containing proline residues are particularly preferred. 

The epitope-bearing peptides and polypeptides of the invention may be 
produced by any conventional means for making peptides or polypeptides 
including recombinant means using nucleic acid molecules of the invention. For 

10 instance, a short epitope-bearing amino acid sequence may be fused to a larger 

polypeptide which acts as a carrier during recombinant production and 
purification, as well as during immunization to produce anti-peptide antibodies. 
Epitope-bearing peptides also may be synthesized using known methods of 
chemical synthesis. For instance, Houghten has described a simple method for 

15 synthesis of large numbers of peptides, such as 10-20 mg of 248 different 13 

residue peptides representing single amino acid variants of a segment of the HA1 
polypeptide which were prepared and characterized (by ELISA-type binding 
studies) in less than four weeks. Houghten, R. A. (1985 ) General method for the 
rapid solid-phase synthesis of large numbers of peptides: specificity of 

20 antigen-antibody interaction at the level of individual amino acids. Proc. Natl. 

Acad ScL USA 52:5131-5135. This "Simultaneous Multiple Peptide Synthesis 
(SMPS)" process is further described in U.S. Patent No. 4,63 1,21 1 to Houghten 
et al (1986). In this procedure the individual resins for the solid-phase synthesis 
of various peptides are contained in separate solvent-permeable packets, enabling 

25 the optimal use of the many identical repetitive steps involved in solid-phase 

methods. A completely manual procedure allows 500-1 000 or more syntheses to 
be conducted simultaneously. Houghten et al, supra, at 5134. 

Epitope-bearing peptides and polypeptides of the invention are used to 
induce antibodies according to methods well known in the art. See, for instance, 

30 Sutcliffe et al., supra; Wilson et al, supra; Chow, M. et al., Proc. Natl Acad. 

Sci. USA 52:910-914; and Bittle, F. J. et al, J. Gen. Virol 66:2347-2354 (1985). 
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Generally, animals may be immunized with free peptide; however, anti-peptide 
antibody titer may be boosted by coupling of the peptide to a macromolecular 
carrier, such as keyhole limpet hemacyanin (KLH) or tetanus toxoid. For 
instance, peptides containing cysteine may be coupled to carrier using a linker 
such as m-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS), while other 
peptides may be coupled to carrier using a more general linking agent such as 
glutaraldehyde. Animals such as rabbits, rats and mice are immunized with either 
free or carrier-coupled peptides, for instance, by intraperitoneal and/or 
intradermal injection of emulsions containing about 100 jig peptide or carrier 
protein and Freund's adjuvant. Several booster injections may be needed, for 
instance, at intervals of about two weeks, to provide a useful titer of anti-peptide 
antibody which can be detected, for example, by ELISA assay using free peptide 
adsorbed to a solid surface. The titer of anti-peptide antibodies in serum from an 
immunized animal may be increased by selection of anti-peptide antibodies, for 
instance, by adsorption to the peptide on a solid support and elution of the 
selected antibodies according to methods well known in the art. 

Immunogenic epitope-bearing peptides of the invention, i.e., those parts 
of a protein that elicit an antibody response when the whole protein is the 
immunogen, are identified according to methods known in the art. For instance, 
Geysen et al, supra, discloses a procedure for rapid concurrent synthesis on solid 
supports of hundreds of peptides of sufficient purity to react in an enzyme-linked 
immunosorbent assay. Interaction of synthesized peptides with antibodies is then 
easily detected without removing them from the support. In this manner a peptide 
bearing an immunogenic epitope of a desired protein may be identified routinely 
by one of ordinary skill in the art. For instance, the immunologically important 
epitope in the coat protein of foot-and-mouth disease virus was located by Geysen 
et al supra with a resolution of seven amino acids by synthesis of an overlapping 
set of all 208 possible hexapeptides covering the entire 213 amino acid sequence 
of the protein. Then, a complete replacement set of peptides in which all 20 
amino acids were substituted in turn at every position within the epitope were 
synthesized, and the particular amino acids conferring specificity for the reaction 



with antibody were determined. Thus, peptide analogs of the epitope-bearing 
peptides of the invention can be made routinely by this method. U.S. Patent No. 
4,708,781 to Geysen (1987) further describes this method of identifying a peptide 
bearing an immunogenic epitope of a desired protein. 

Further still, U.S. Patent No. 5,194,392 to Geysen (1990) describes a 
general method of detecting or determining the sequence of monomers (amino 
acids or other compounds) which is a topological equivalent of the epitope (i.e., 
a "mimotope") which is complementary to a particular paratope (antigen binding 
site) of an antibody of interest. More generally, U.S. Patent No. 4,433,092 to 
Geysen (1989) describes a method of detecting or determining a sequence of 
monomers which is a topographical equivalent of a ligand which is 
complementary to the ligand binding site of a particular receptor of interest. 
Similarly, U.S. Patent No. 5,480,971 to Houghten, R. A. et al (1996) on 
Peralkylated Oligopeptide Mixtures discloses linear C,-C 7 -alkyl peralkylated 
oligopeptides and sets and libraries of such peptides, as well as methods for using 
such oligopeptide sets and libraries for determining the sequence of a peralkylated 
oligopeptide that preferentially binds to an acceptor molecule of interest. Thus, 
non-peptide analogs of the epitope-bearing peptides of the invention also can be 
made routinely by these methods. 

The entire disclosure of each document cited in this section on 
"Polypeptides and Peptides" is hereby incorporated herein by reference. 

As one of skill in the art will appreciate, £. coli PAI polypeptides of the 
present invention and the epitope-bearing fragments thereof described above can 
be combined with parts of the constant domain of immunoglobulins (IgG), 
resulting in chimeric polypeptides. These fusion proteins facilitate purification 
and show an increased half-life in vivo. This has been shown, e.g., for chimeric 
proteins consisting of the first two domains of the human CD4-polypeptide and 
various domains of the constant regions of the heavy or light chains of 
mammalian immunoglobulins (EP A 394,827; Traunecker et al, Nature 357:84- 
86 (1988)). Fusion proteins that have a disulfide-linked dimeric structure due to 
the IgG part can also be more efficient in binding and neutralizing other 
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molccules than the monomeric E. coli J96 PAI proteins or protein fragments 
alone (Fountoulakis et ai, J Biochem 270:3958-3964 (1995)). 

Vaccines 

In another embodiment, the present invention relates to a vaccine, 
preferably in unit dosage form, comprising one or more E. coli J96 PAI antigens 
together with a pharmaceutically acceptable diluent, carrier, or excipient, wherein 
the one or more antigens are present in an amount effective to elicit a protective 
immune response in an animal to pathogenic E. coli. Antigens of E. coli J96 PAI 
IV and V may be obtained from polypeptides encoded for by the ORFs listed in 
Tables 1-6, particularly Tables 1-4, using methods well known in the art. 

In a preferred embodiment, the antigens are E. coli J96 PAI IV or PAI V 
proteins that are present on the surface of pathogenic E. coli. In another preferred 
embodiment, the pathogenic E, coli J96 PAI IV or PAI V protein-antigen is 
conjugated to an E. coli capsular polysaccharide (CP), particularly to capsular 
polypeptides that are more prevalent in pathogenic strains, to produce a double 
vaccine. CPs, in general, may be prepared or synthesized as described in 
Schneerson et ai J. Exp. Med 752/361-376 (1980); Marburg et ai J. Am. Chem. 
Soc. 108:5282 (1986); Jennings et ai, J- Immunol. 727:101 1-101 8 (1981); and 
Beuvery et ai, Infect. Immunol. 40:39-45 (1983). In a further preferred 
embodiment, the present invention relates to a method of preparing a 
polysaccharide conjugate comprising: obtaining the above-described E. coli J96 
PAI antigen; obtaining a CP or fragment from pathogenic E. coli; and conjugating 
the antigen to the CP or CP fragment. 

In a preferred embodiment, the animal to be protected is selected from the 
group consisting of humans, horses, deer, cattle, pigs, sheep, dogs, and chickens. 
In a more preferred embodiment, the animal is a human or a dog. 

In a further embodiment, the present invention relates to a prophylactic 
method whereby the incidence of pathogenic E. co//-induced symptoms are 
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decreased in an animah comprising administering to the animal the above- 
described vaccine, wherein the vaccine is administered in an amount effective to 
elicit protective antibodies in an animal to pathogenic E. coli. This vaccination 
method is contemplated to be useful in protecting against severe diarrhea 
(pathogenic intestinal E. coli strains), urinary tract infections (uropathogenic E. 
coli) and infections of the brain (extraintestinal E. coli). The vaccine of the 
invention is used in an effective amount depending on the route of administration. 
Although intra-nasal, subcutaneous or intramuscular routes of administration are 
preferred, the vaccine of the present invention can also be administered by an 
oral, intraperitoneal or intravenous route. One skilled in the art will appreciate 
that the amounts to be administered for any particular treatment protocol can be 
readily determined without undue experimentation. Suitable amounts are within 
the range of 2 micrograms of the protein per kg body weight to 1 00 micrograms 
per kg body weight. 

The vaccine can be delivered through a vector such as BCG. The vaccine 
can also be delivered as naked DN A coding for target antigens. 

The vaccine of the present invention may be employed in such dosage 
forms as capsules, liquid solutions, suspensions or elixirs for oral administration, 
or sterile liquid forms such as solutions or suspensions. Any inert carrier is 
preferably used, such as saline, phosphate-buffered saline, or any such carrier in 
which the vaccine has suitable solubility properties. The vaccines may be in the 
form of single dose preparations or in multi-dose flasks which can be used for 
mass vaccination programs. Reference is made to Remington's Pharmaceutical 
Sciences, Mack Publishing Co., Easton, PA, Osol (ed.) (1980); and New Trends 
and Developments in Vaccines, Voller et al (eds.), University Park Press, 
Baltimore, MD (1978), for methods of preparing and using vaccines. 

The vaccines of the present invention may further comprise adjuvants 
which enhance production of antibodies and immune cells. Such adjuvants 
include, but are not limited to, various oil formulations such as Freund's complete 
adjuvant (CFA), the dipeptide known as MDP, saponins (ex. Ouillajasaponin 
fraction QA-21, U.S. Patent No. 5,047.540), aluminum hydroxide, or lymphatic 
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cytokines. Freund's adjuvant is an emulsion of mineral oil and water which is 
mixed with the immunogenic substance. Although Freund's adjuvant is powerful, 
it is usually not administered to humans. Instead, the adjuvant alum (aluminum 
hydroxide) may be used for administration to a human. Vaccine may be absorbed 
5 onto the aluminum hydroxide from which it is slowly released after injection. 

The vaccine may also be encapsulated within liposomes according to Fullerton, 
U.S. Patent No. 4,235,877. 

Protein Function 

Each ORF described in Tables 1 and 3 possesses a biological role similar 
10 to the role associated with the identified homologous protein. This allows the 

skilled artisan to determine a function for each identified coding sequence. For 
example, a partial list of the E. coli protein functions provided in Tables 1 and 3 
includes many of the functions associated with virulence of pathogenic bacterial 
strains. These include, but are not limited to adhesins, excretion pathway 
1 5 proteins, O-antigen/carbohydrate modification, cytotoxins and regulators. A more 

detailed description of several of these functions is provided in Example 1 below. 

Diagnostic Assays 

In another preferred embodiment, the present invention relates to a 
method of detecting pathogenic £. coli nucleic acid in a sample comprising: 
20 a) contacting the sample with the above-described nucleic acid probe, 

under conditions such that hybridization occurs, and 

b) detecting the presence of the probe bound to pathogenic E. coli nucleic 

acid. 

In another preferred embodiment, the present invention relates to a 
25 diagnostic kit for detecting the presence of pathogenic E. coli nucleic acid in a 

sample comprising at least one container means having disposed therein the 
above-described nucleic acid probe. 
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In another preferred embodiment, the present invention relates to a 
diagnostic kit for detecting the presence of pathogenic E. coli antigens in a sample 
comprising at least one container means having disposed therein the above- 
described antibodies. 

5 In another preferred embodiment, the present invention relates to a 

diagnostic kit for detecting the presence of antibodies to pathogenic E. coli 
antigens in a sample comprising at least one container means having disposed 
therein the above-described antigens. 

The present invention provides methods to identify the expression of an 

10 ORF of the present invention, or homolog thereof, in a test sample, using one of 

the antibodies of the present invention. Such methods involve incubating a test 
sample with one or more of the antibodies of the present invention and assaying 
for binding of the antibodies to components within the test sample. 

In a further embodiment, the present invention relates to a method for 

1 5 identifying pathogenic E. coli in an animal comprising analyzing tissue or body 

fluid from the animal for a nucleic acid, protein, polypeptide-antigen or antibody 
specific to one of the ORFs described in Tables 1-4 herein from E. coli J96 PAI 
IV or V. Analysis of nucleic acid specific to pathogenic E. coli can be by PCR 
techniques or hybridization techniques (cf. Molecular Cloning: A Laboratory 

20 Manual, second edition, edited by Sambrook, Fritsch, & Maniatis, Cold Spring 

Harbor Laboratory, 1989; Eremeeva et al t 1 Clin. Microbiol 32. 803-810 (1994) 
which describes differentiation among spotted fever group Rickettsiae species by 
analysis of restriction fragment length polymorphism of PCR-amplified DNA). 
Proteins or antibodies specific to pathogenic E. coli may be identified as 

25 described in Molecular Cloning: A Laboratory Manual, second edition, 

Sambrook et al, eds., Cold Spring Harbor Laboratory (1989). More specifically, 
antibodies may be raised to E. coli J96 PAI proteins as generally described in 
Antibodies: A Laboratory Manual, Harlow and Lane, eds.. Cold Spring Harbor 
Laboratory (1988). E. coli J96 PAI-specific antibodies can also be obtained from 

30 infected animals (Mather, T. et al.,JAMA 205:186-188 (1994)). 
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In another embodiment, the present invention relates to an antibody 
having binding affinity specifically to an E. coli J96 PAI antigen as described 
above. The E. coli J96 PAI antigens of the present invention can be used to 
produce antibodies or hybridomas. One skilled in the art will recognize that if an 
5 antibody is desired, a peptide can be generated as described herein and used as an 

immunogen. The antibodies of the present invention include monoclonal and 
polyclonal antibodies, as well as fragments of these antibodies. The invention 
further includes single chain antibodies. Antibody fragments which contain the 
idiotype of the molecule can be generated by known techniques, for example, 
10 such fragments include but are not limited to: the F(ab') 2 fragment; the Fab' 

fragments, Fab fragments, and Fv fragments. 

Of special interest to the present invention are antibodies to pathogenic 
E. coli antigens which are produced in humans, or are "humanized*' (i.e. non- 
immunogenic in a human) by recombinant or other technology. Humanized 
1 5 antibodies may be produced, for example by replacing an immunogenic portion 

of an antibody with a corresponding, but non-immunogenic portion (i.e. chimeric 
antibodies) (Robinson, R.R. et aL, International Patent Publication 
PCT7US86/02269; Akira, K. et aL, European Patent Application 184,187; 
Taniguchi, M., European Patent Application 171,496; Morrison, S.L. et aL, 
20 European Patent Application 1 73,494; Neuberger, M.S. et aL, PCT Application 

WO 86/01533; Cabilly, S. et aL 9 European Patent Application 125,023; Better, 
M. et aL, Science 240:1041-1043 (1988); Liu, A.Y. et aL, Proc. Natl. Acad. ScL 
USA 54:3439-3443 (1987); Liu, A.Y. et aL, J. Immunol 759:3521-3526 (1987); 
Sun, L.K. et aL, Proc. Natl. Acad. ScL USA 54:214-218 (1987); Nishimura, Y. 
25 et aL, Cane. Res. 47:999-1005 (1987); Wood, C.R. et aL, Nature 374:446-449 

(1985)); Shaw et aL, J. Natl. Cancer Inst. 50:1553-1559 (1988). General reviews 
of "humanized" chimeric antibodies are provided by Morrison, S.L. (Science, 
229:1202-1207 (1985)) and by Ou V.T. et aL. BioTechniqucs 4:214 (1986)). 
Suitable "humanized" antibodies can be alternatively produced by CDR or CEA 
30 substitution (Jones, P.T. et aL, Nature 327:552-525 (1986); Verhoeyan et aL, 
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Science 239:1534 (1988); Beidler, C.B. et aL J. Immunol. 7 -/7 .4053-4060 
(1988)). 

In another embodiment, the present invention relates to a hybridoma 
which produces the above-described monoclonal antibody. A hybridoma is an 
5 immortalized cell line which is capable of secreting a specific monoclonal 

antibody. 

In general, techniques for preparing monoclonal antibodies and 
hybridomas are well known in the art (Campbell, "Monoclonal Antibody 
Technology: Laboratory Techniques in Biochemistry and Molecular Biology " 

10 Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. Groth 

etal. J. Immunol Methods 55:1-21 (1980)). 

In another embodiment, the present invention relates to a method of 
detecting a pathogenic E. coli antigen in a sample, comprising: a) contacting the 
sample with an above-described antibody, under conditions such that 

1 5 immunocomplexes form, and b) detecting the presence of said antibody bound to 

the antigen. In detail, the methods comprise incubating a test sample with one or 
more of the antibodies of the present invention and assaying whether the antibody 
binds to the test sample. 

Conditions for incubating an antibody with a test sample vary. Incubation 

20 conditions depend on the format employed in the assay, the detection methods 

employed, and the type and nature of the antibody used in the assay. One skilled 
in the art will recognize that any one of the commonly available immunological 
assay formats (such as radioimmunoassays, enzyme-linked immunosorbent 
assays, diffusion based Ouchterlony, or rocket immunofluorescent assays) can 

25 readily be adapted to employ the antibodies of the present invention. Examples 

of such assays can be found in Chard. An Introduction to Radioimmunoassay and 
Related Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands 
(1986); Bullock et aL, Techniques in Immunocytochemistry, Academic Press. 
Orlando, FL Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, Practice and 

30 Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and 

Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 
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(1985); and Antibodies: A Laboratory Manual. Harlow and Lane, eds., Cold 
Spring Harbor Laboratory (1988). 

The immunological assay test samples of the present invention include 
cells, protein or membrane extracts of cells, or biological fluids such as blood, 
5 serum, plasma, or urine. The test sample used in the above-described method will 

vary based on the assay format, nature of the detection method and the tissues, 
cells or extracts used as the sample to be assayed. Methods for preparing protein 
extracts or membrane extracts of cells are well known in the art and can be 
readily be adapted in order to obtain a sample which is capable with the system 
10 utilized. 

In another embodiment, the present invention relates to a method of 
detecting the presence of antibodies to pathogenic E. coli in a sample, 
comprising: a) contacting the sample with an above-described antigen, under 
conditions such that immunocomplexes form, and b) detecting the presence of 
15 said antigen bound to the antibody. In detail, the methods comprise incubating 

a test sample with one or more of the antigens of the present invention and 
assaying whether the antigen binds to the test sample. 

In another embodiment of the present invention, a kit is provided which 
contains all the necessary reagents to carry out the previously described methods 
20 of detection. The kit may comprise: i) a first container means containing an 

above-described antibody, and ii) second container means containing a conjugate 
comprising a binding partner of the antibody and a label. In another preferred 
embodiment, the kit further comprises one or more other containers comprising 
one or more of the following: wash reagents and reagents capable of detecting 
25 the presence of bound antibodies. Examples of detection reagents include, but are 

not limited to, labeled secondary antibodies, or in the alternative, if the primary 
antibody is labeled, the chromophoric. enzymatic, or antibody binding reagents 
which are capable of reacting with the labeled antibody. The compartmentalized 
kit may be as described above for nucleic acid probe kits. 
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One skilled in the an will readily recognize that the antibodies described 
in the present invention can readily be incorporated into one of the established kit 
formats which are well known in the art. 

Screening Assay for Binding Agents 

5 Using the isolated proteins described herein, the present invention further 

provides methods of obtaining and identifying agents that bind to a protein 
encoded by an E. coli J96 PAI ORF or to a fragment thereof. 
The method involves: 

(a) contacting an agent with an isolated protein encoded by a E. coli 
10 J96 PAI ORF, or an isolated fragment thereof; and 

(b) determining whether the agent binds to said protein or said 
fragment. 

The agents screened in the above assay can be, but are not limited to, 
peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents. The 

15 agents can be selected and screened at random or rationally selected or designed 

using protein modeling techniques. For random screening, agents such as 
peptides, carbohydrates, pharmaceutical agents and the like are selected at 
random and are assayed for their ability to bind to the protein encoded by an ORF 
of the present invention. 

20 Alternatively, agents may be rationally selected or designed. As used 

herein, an agent is said to be "rationally selected or designed" when the agent is 
chosen based on the configuration of the particular protein. For example, one 
skilled in the art can readily adapt currently available procedures to generate 
peptides, pharmaceutical agents and the like capable of binding to a specific 

25 peptide sequence in order to generate rationally designed antipeptide ligands, for 

example see Hurby et al, Application of Synthetic Peptides: Antisense Peptides, 
In Synthetic Peptides, A User's Guide, W.H. Freeman, NY (1992), pp. 289-307, 
and Kaspczak et a!., Biochemistry 25:9230-8 (1989). 
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In addition to the foregoing, one class of agents of the present invention, 
can be used to control gene expression through binding to one of the ORF 
encoding regions or EMFs of the present invention. As described above, such 
agents can be randomly screened or rationally designed and selected. Targeting 

5 the encoding region or EMF allows a skilled artisan to design sequence specific 

or element specific agents, modulating the expression of either a single ORF 
encoding region or multiple encoding regions that rely on the same EMF for 
expression control. 

One class of DNA binding agents are those that contain nucleotide base 

10 residues that hybridize or form a triple helix by binding to DNA or RNA. Such 

agents can be based on the classic phosphodiester, ribonucleic acid backbone, or 
can be a variety of sulfhydryl or polymeric derivatives having base attachment 
capacity. 

Agents suitable for use in these methods usually contain 20 to 40 bases 
1 5 and are designed to be complementary to a region of the gene involved in 

transcription (triple helix - see Lee et ai,NucL Acids Res. 6:3073 (1979); Cooney 
et aL, Science 247:456 (1988); and Dervan ei aL, Science 251: 1360 (1991)) or 
to the mRNA itself (antisense - Okano, J. Neurochem. 56:560 (1991); 
Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, 
20 Boca Raton, FL (1988)). Triple helix-formation optimally results in a shut-off of 

RNA transcription from DNA, while antisense RNA hybridization blocks 
translation of an mRNA molecule into polypeptide. Both techniques have been 
demonstrated to be effective in model systems. Information contained in the 
sequences of the present invention is necessary for the design of an antisense or 
25 triple helix oligonucleotide and other DNA binding agents. 

Computer Related Embodiments 

The nucleotide sequence provided in SEQ ID NOs: 1 through 142, 
representative fragments thereof, or nucleotide sequences at least 99.9% identical 
to the sequences provided in SEQ ID NOs: 1 through 142, can be "provided" in 
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a variety of media to facilitate use thereof. As used herein, "provided" refers to 
a manufacture, other than an isolated nucleic acid molecule, that contains a 
nucleotide sequence of the present invention, i.e., the nucleotide sequence 
provided in SEQ IDNOs: 1 through 142. a representative fragment thereof, or a 
5 nucleotide sequence at least 99.9% identical to SEQ ID NOs: 1 through 142. 

Such a manufacture provides the E. coli J96 PAI subgenomes or a subset thereof 
(e.g., one or more E. coli J96 PAI open reading frame (ORE)) in a form that 
allows a skilled artisan to examine the manufacture using means not directly 
applicable to examining the E. coli J96 PAI subgenome or a subset thereof as it 
10 exists in nature or in purified form. 

In one application of this embodiment, one or more nucleotide sequences 
of the present invention can be recorded on computer readable media. As used 
herein, "computer readable media" refers to any medium that can be read and 
accessed directly by a computer. Such media include, but are not limited to: 
15 magnetic storage media, such as floppy discs, hard disc storage medium, and 

magnetic tape; optical storage media such as CD-ROM; electrical storage media 
such as RAM and ROM; and hybrids of these categories such as magnetic/optical 
storage media. A skilled artisan can readily appreciate how any of the presently 
known computer readable mediums can be used to create a manufacture 
20 comprising computer readable medium having recorded thereon a nucleotide 

sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable medium. A skilled artisan can readily adopt any of the 
presently know methods for recording information on computer readable medium 
25 to generate manufactures comprising the nucleotide sequence information of the 

present invention. A variety of data storage structures are available to a skilled 
artisan for creating a computer readable medium having recorded thereon a 
nucleotide sequence of the present invention. The choice of the data storage 
structure will generally be based on the means chosen to access the stored 
30 information. In addition, a variety of data processor programs and formats can 

be used to store the nucleotide sequence information of the present invention on 
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computer readable medium. The sequence information can be represented in a 
word processing text file, formatted in commercially-available software such as 
WordPerfect and MicroSoft Word, or represented in the form of an ASCII file, 
stored in a database application, such as DB2, Sybase, Oracle, or the like. A 
skilled artisan can readily adapt any number of dataprocessor structuring formats 
(e.g. text file or database) in order to obtain computer readable medium having 
recorded thereon the nucleotide sequence information of the present invention. 

By providing the nucleotide sequence of SEQ ID NOs: 1 through 142, 
representative fragments thereof, or nucleotide sequences at least 99.9% identical 
to SEQ ID NOs: 1 through 142, in computer readable form, a skilled artisan can 
routinely access the sequence information for a variety of purposes. Computer 
software is publicly available which allows a skilled artisan to access sequence 
information provided in a computer readable medium. The examples which 
follow demonstrate how software which implements the BLAST (Altschul et al 9 
J. Mol Biol 2/5:403-410 (1990)) and BLAZE (Brutlag et al, Comp. Chem. 
1 7:203-207 (1993)) search algorithms on a Sybase system can be used to identify 
open reading frames (ORFs) within the £. coli J96 PAI subgenome that contain 
homology to ORFs or proteins from other organisms. Such ORFs are protein- 
encoding fragments within the E. coli J96 PAI subgenome and are useful in 
producing commercially important proteins such as enzymes used in modifying 
surface O-antigens of bacteria. A comprehensive list of ORFs encoding 
commercially important E. coli J96 PAI proteins is provided in Tables 1 through 
6. 

The present invention provides a DNA sequence - gene database of 
pathogenicity islands (PAIs) for E. coli involved in infectious diseases. This 
database is useful for identifying and characterizing the basic functions of new 
virulence genes for E. coli involved in uropathogenic and extraintestinal diseases. 
The database provides a number of novel open reading frames that can be 
selected for further study as described herein. 

Selectable insertion mutations in plasmid subclones encoding PAI genes 
with potentially significant phenotypes for E. coli uropathogenesis and sepsis can 
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he isolated. The mutations are then crossed back into wild type, uropathogenic 
E. coli by homologous recombination to create wild-type strains specifically 
altered in the targeted gene. The significance of the genes to E. coli pathogenesis 
is assessed by in vitro assays and in vivo murine models of sepsis/peritonitis and 
5 ascending urinary tract infection. 

New virulence genes and PAI sites in uropathogenic E. coli may be 
identified by the transposon signature-tagged mutagenesis system and negative 
selection of E. coli mutants avirulent in murine models of ascending urinary tract 
infection or peritonitis. 

10 Epidemiological investigations of new virulence genes and PAIs may be 

used to test for their occurrence in the genomes of other pathogenic and 
opportunistic members of the Enterobacteriaceae. 

One can choose from the ORFs included in SEQ ID NOs: 1 through 142, 
using Tables 1 through 6 as a useful guidepost for selecting, as candidates for 

1 5 targeted mutagenesis, a limited number of candidate genes within the PAIs based 

on their homology to virulence, export or regulation genes in other pathogens. 
For the large number of apparent genes within the PAIs that do not share 
sequence similarity to any entries in the database, the transposon signature-tagged 
mutagenesis method developed by David Holden's laboratory can be employed 

20 as an independent means of virulence gene identification. 

Allelic knock-outs are constructed using different /?/r-dependent suicide 
vectors (Swihart, K.A. and R.A. Welch, Infect. Immun. 55:1853-1869 (1990)). 
In addition, two different animal model systems can be employed for assessment 
of pathogenic determinants. The initial identification of E. coli hemolysin as a 

25 virulence factor came from the construction of isogenic E. coli strains that were 

tested in a rat model of intra-abdominal sepsis (Welch, R.A. et al. t Nature 
(London) 294:665-661 (1981)). The ascending UTI (Urinary Tract Infection) 
mouse model was also successfully performed with allelic knock-outs of the 
hpmA hemolysin of Proteus mirabilis (Swihart, K.A. and R.A. Welch, Infect. 

30 Immun. 55:1853-1869 (1990)). 
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The present invention further provides systems, particularly computer- 
based systems, which contain the sequence information described herein. Such 
systems are designed to identify commercially important fragments of the E. coli 
J96 PA1 subgenome. As used herein, M a computer-based system" refers to the 
hardware means, software means, and data storage means used to analyze the 
nucleotide sequence information of the present invention. The minimum 
hardware means of the computer-based systems of the present invention 
comprises a central processing unit (CPU), input means, output means, and data 
storage means. A skilled artisan can readily appreciate that any one of the 
currently available computer-based systems are suitable for use in the present 
invention. 

As indicated above, the computer-based systems of the present invention 
comprise a data storage means having stored therein a nucleotide sequence of the 
present invention and the necessary hardware means and software means for 
supporting and implementing a search means. As used herein, "data storage 
means" refers to memory that can store nucleotide sequence information of the 
present invention, or a memory access means which can access manufactures 
having recorded thereon the nucleotide sequence information of the present 
invention. As used herein, "search means" refers to one or more programs which 
are implemented on the computer-based system to compare a target sequence or 
target structural motif with the sequence information stored within the data 
storage means. Search means are used to identify fragments or regions of the E. 
coli genome that match a particular target sequence or target motif. A variety of 
known algorithms are disclosed publicly and a variety of commercially available 
software for conducting search means are available and can be used in the 
computer-based systems of the present invention. Examples of such software 
include, but are not limited to, MacPattern (EMBL). BLASTN and BLASTX 
(NCBIA). A skilled artisan can readily recognize that any one of the available 
algorithms or implementing software packages for conducting homology searches 
can be adapted for use in the present computer-based systems. 
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As used herein, a "target sequence" can be any DNA or amino acid 
sequence of six or more nucleotides or two or more amino acids. A skilled 
artisan can readily recognize that the longer a target sequence is, the less likely 
a target sequence will be present as a random occurrence in the database. The 
5 most preferred sequence length of a target sequence is from about 10 to 100 

amino acids or from about 30 to 300 nucleotide residues. However, it is well 
recognized that during searches for commercially important fragments of the E. 
coli J96 PAI subgenome, such as sequence fragments involved in gene expression 
and protein processing, may be of shorter length. 

1 0 As used herein, "a target structural motif," or "target motif," refers to any 

rationally selected sequence or combination of sequences in which the 
sequence(s) are chosen based on a three-dimensional configuration which is 
formed upon the folding of the target motif. There are a variety of target motifs 
known in the art. Protein target motifs include, but are not limited to, enzymic 

15 active sites and signal sequences. Nucleic acid target motifs include, but are not 

limited to, promoter sequences, hairpin structures and inducible expression 
elements (protein binding sequences). 

Thus, the present invention further provides an input means for receiving 
a target sequence, a data storage means for storing the target sequence and the 

20 homologous E. coli J96 PAI sequence identified using a search means as 

described above, and an output means for outputting the identified homologous 
E. coli J96 PAI sequence. A variety of structural formats for the input and output 
means can be used to input and output information in the computer-based systems 
of the present invention. A preferred format for an output means ranks fragments 

25 of the E. coli J96 PAI subgenome possessing varying degrees of homology to the 

target sequence or target motif. Such presentation provides a skilled artisan with 
a ranking of sequences which contain various amounts of the target sequence or 
target motif and identifies the degree of homology contained in the identified 
fragment. 

30 A variety of comparing means can be used to compare a target sequence 

or target motif with the data storage means to identify sequence fragments of the 
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E coll J96 PAI subgenomes. For example, implementing software which 

implement the BLAST and BLAZE algorithms (Altschul et al., J. Mol Biol. 

275:403-410 (1990)) can be used to identify open reading frames within the E. 

coli J96 PAI subgenome A skilled artisan can readily recognize that any one of 

the publicly available homology search programs can be used as the search means 

for the computer-based systems of the present invention. 

One application of this embodiment is provided in Figure 2. Figure 2 

provides a block diagram of a computer system 102 that can be used to 
implement the present invention. The computer system 102 includes a processor 
106 connected to a bus 104. Also connected to the bus 104 are a main memory 
108 (preferably implemented as random access memory, RAM) and a variety of 
secondary storage devices 1 1 0, such as a hard drive 1 1 2 and a removable medium 
storage device 1 14. The removable medium storage device 114 may represent, 
for example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. 
A removable storage medium 1 1 6 (such as a floppy disk, a compact disk, a 
magnetic tape, etc.) containing control logic and/or data recorded therein may be 
inserted into the removable medium storage device 1 14. The computer system 
102 includes appropriate software for reading the control logic and/or the data 
from the removable medium storage device 1 14 once inserted in the removable 
medium storage device 114. 

A nucleotide sequence of the present invention may be stored in a well 
known manner in the main memory 108, any of the secondary storage devices 
110, and/or a removable storage medium 116. Software for accessing and 
processing the genomic sequence (such as search tools, comparing tools, etc.) 
reside in main memory 108 during execution. 

Having generally described the invention, the same will be more readily 
understood by reference to the following examples, which are provided by way 
of illustration and are not intended as limiting. 
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Experimental 

Example I: High Through-put Sequencing of Cosmid Clones Covering PAI IV 
and PAI V in E. coli J96 

The complete DNA sequence of the pathogenicity islands, PAI IV and 
5 PAI V (respectively > 1 70 kb and - 1 1 0 kb), from uropathogenic E. coli strain, J96 

(04:K6) was determined using a strategy, cloning and sequencing method, data 
collection and assembly software essentially identical to those used by the TIGR 
group for determining the sequence of the Haemophilus influenzae genome 
(Fleischmann, R.D., et aL, Science 269:496 (1995)). The sequences were then 

10 used for DNA and protein sequence similarity searches of the databases as 

described in Fleischmann, Id. 

The analysis of the genetic information found within the PAIs of E. coli 
J96 was facilitated by the use of overlapping cosmid clones possessing these 
unique segments of DNA. These cosmid clones were previously constructed and 

1 5 mapped (as further described below) as an overlapping set in the laboratory of Dr. 

Doug Berg (Washington University). A gap exists between the left portion of 
cosmid 2 and the end of the PAI IV that would represent the pheV junction to the 
E. coli K-12 genome. 

Uropathogenic strain E. coli J96 (04:K6) was used as a source of 

20 chromosomal DNA for construction of a cosmid library. £. coli K-12 DH5a and 

DH12 (Gibco/BRL, Gaithersburg, Md.) were used as hosts for maintaining 
cosmid and plasmid clones. The cosmid library of E coli J96 DNA was 
constructed essentially as described by Bukanow & Berg (Mol. Microbiol 77:509- 
523 (1994)). DNA was digested with Sau3A\ under conditions that generated 

25 fragments with an average size of 40 to 50 kb and electrophoresed through 1% 

agarose gels. Fragments of 35 to 50 kb were isolated and cloned into Lorist 6 
vector that had been linearized with Bam\\\ and treated with bacterial alkaline 
phosphatase to block self-ligation. (Lorist 6 is a 5.2-kb moderate-copy-number 
cosmid vector with T7 and SP6 promoters close to the cloning site.) Cloned 
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DN A was packaged in lambda phage particles in vitro by using a commercial kit 
(Amersham, Arlington Heights, IL) and cosmid-containing phage particles were 
used to transduce E. coli DH5a. Transductant colonies were transferred to 150 
uL of Luria-Bertani broth supplemented with kanamycin in 96-well microtiter 
plates and grown overnight at 37 °C with shaking. Two sets of clones, one for 
each PAI were ultimately assembled, as previously described (Swenson et al t 
Infection and Immunity 6-^:3736-3743 (1996)), fully incorporated by reference 
herein). 

The two sets of clones contain eleven sub-clones that were employed in 
the sequencing method described below. One set of four overlapping cosmid 
clones covers the prs-containing PAI V, ATCC Deposit No. 97727, deposited 
September 23, 1996. A second set of seven subclones covers much of the ^op- 
containing PAI V, ATCC Deposit No. 97726, deposited September 23, 1996. See 
Figure 1. 

A high through-put, random sequencing method (Fleischmann et al, 
Science 269:496 (1 995); Fraser et al, Science 270391 (1 995)) was used to obtain 
the sequences for 142 (contigs) fragments of E. coli J96 PAIs. All clones were 
sequenced from both ends to aid in the eventual ordering of contigs during the 
sequence assembly process. Briefly, random libraries of - 2 kb clones covering 
the two J96 PAIs were constructed, ~ 2,800 clones were subjected to automated 
sequencing (~ 450 nt/clone) and preliminary assemblies of the sequences 
accomplished which result in 142 contigs for each of the two PAIs that total 95 
and 135 kb respectively. The estimated sizes of the PAI IV and PAI V based on 
the overlapping cosmid clones are 1.7 X 10 5 and 1.1 X 10 5 bp respectively. The 
142 sequences were assembled by means of the TIGR Assembler (Fleischmann 
et al.; Fraser et al); Sutton et al, Genome Sci. Tech. 7:9 (1995)). Sequence and 
physical gaps were closed using a combination of strategies (Fleischmann et al. : 
Fraser et al ). Presently the average depth of sequencing for each base assembled 
in the contigs is 6-fold. The tentative identity of many genes based on sequence 
homology is covered in Tables 1, 3, 5 and 6. 
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Open reading frames (ORFs) and predicted protein-coding regions were 
identified as described (Fleischmann et al; Fraser et al) with some modification. 
In particular, the statistical prediction of uropathogenic E. coli J96 pathogenicity 
island genes was performed with GeneMark (Borodovsky, M. & Mclninch, J. 
5 Comput. Chem. 7 7:123 (1993)). Regular GeneMark uses nonhomogeneous 

Markov models derived from a training set of coding sequences and ordinary 
Markov models derived from a training set of noncoding sequences. The ORFs 
in Tables 1-6 were identified by GeneMark using a second-order Markov model 
trained from known E. coli coding regions and known E. coli non-coding regions. 

10 Among the important genes that are implicated in the virulence of E. coli 

J96 PAIs are adhesins, excretion pathway proteins, proteins that participate in 
alterations of the O-antigen in the PAIs, cytotoxins, and two-component 
(membrane sensor/DNA binding) proteins. 

7. Adhesins. It is believed that the principal adhesin determinants 

1 5 involved in uropathogenicity that are present within PAIs of uropathogenic E, coli 

are the pili encoded by the /?ap-related operons (Hultgren et al, Infect. Immun. 
50:370-377 (1993), Stromberg et al, EMBO 79:2001-2010 (1990), High et al, 
Infect. Immun. 5(5:513-517 (1988)) and the distantly related afimbrial adhesins 
(Labigne-Roussel et al, Infect. Immun. 46:251-259 (1988)). The presence of two 

20 of these {pap, and prs) has been confirmed. In addition potential genes for five 

other adhesins including sla (described above), AIDA-I (diffuse adherence- 
DEAC), hra (heat resistant hemagglutinin-ETEC), fha (filamentous 
hemagglutinin- Bordetella pertussis) and the arg-gingipain proteinase of 
Porphyromonas gingivalis have been found. 

25 IL Type II exoprotein secretion pathway. Highly significant 

statistics support the presence of multiple genes involved in the type II exoprotein 
pathway. Curiously, perhaps two different determinants appear to be present in 
PA1 IV where one set of genes has the highest sequence similarity to eps-Wkz 
genes (Vibrio cholerae Ctx export) and the other has greatest similarity to exe 

30 genes {Aeromonas hydophilia aerolysin and protease export). At present, the 

assembly of contigs involving these potential genes is incomplete. Thus, it is 
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uncertain if two separate and complete determinants are present. However, it is 
clear that these genes are newly discovered and novel to pathogenic E. coli 
because the derived sequences do not have either the bfp or hop genes as the 
highest matches. The gene products that are the target of the type II export 
pathway are not evident at this time. 

Within PAI IV there are sequences which suggest genes very similar to 
secD and secF. These two linked genes encode homologous products that are 
localized to the inner membrane and are hypothesized to play a late role in the 
translocation of leader-peptide containing proteins across the inner membrane of 
gram-negative bacteria. In addition, in each PAI, sequences are found that are 
reminiscent of the heat-shock htrA/degA gene that encodes a piroplasmic 
protease. They may perform endochaperone-like function as Pugsley et al have 
hypothesized for different exoprotein pathways. 

77/. O-antigen/capsule/carbohydrate modification (Nod genes). J96 
has the 04. The O-antigen portion of lipopolysaccharide is encoded by rjb genes 
that are located at 45 min. on the E. coli chromosome. We have found in both 
PAIs a cumulative total of five possible r/fe-like genes which could participate 
alterations of the O-antigen in the PAIs. Overall these data suggest that PAIs 
provide the genetic potential for greater change of the cell surface for 
uropathogenic E. coli strains than what was previously known. 

The apparent capsule type for strain J96 is a non-sialic acid K6-type. 
Sequence similarity "hits" were made in PAI IV region to two region- 1 capsule 
genes, kpsS and kpsE involved in the stabilization of polysaccharide synthesis and 
polysaccharide export across the inner membrane. This is not altogether 
surprising based on the genetic mapping of the kps locus to serA at 63 minutes on 
the genome of the Kl capsular type of E. coli. This suggests that these kps-Mke 
genes either are participating in the K6-biosynthesis or perhaps are involved in 
complex carbohydrate export for other purposes. 

An intriguing discovery are the hits made on genes involved in bacteria- 
plant interactions by Rhizobium, Bradyrhizobium and Agrobacterium. Four 
potential genes identified thus far share significant sequence similarity to genes 
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encoding products that modify lipo-oligosaccharides that influence nodule 
morphogenesis on legume roots. These are: ORF140, carbamyl phosphate 
synthetase; nodulation protein 1265; phosphate-regulatory protein; and an ORF 
at a plant-inducible locus in Agrobacterium. To date there are no descriptions in 
5 the literature of such gene products being utilized by human or animal bacterial 

pathogens for the purposes of modification or secretion of extracellular 
carbohydrate. However, the sequence similarity to the capsular region-2 genes 
and to lipooligosaccharide biosynthetic genes in Rhizobium spp has been recently 
noted by Petit (1995). 

10 IV. Cytotoxins. Besides the previously known hemolysin and CNF 

toxins in the PAIs, in each PAI sequences similar to the shlBA operon (cosmid 5 
and 12) were found for a cytolytic toxin from Serratia marcescens and Proteus 
mirabilis. Ironically, the P. mirabilis hemolysin (HpmA) member of this family 
of toxins was discovered by Uphoff and Welch (1990), but not thought to exist 

15 in other members of the Enterobacteriaceae (Swihart (1990)). A shlBAike 

transporter does also appear to be involved in the export of the filamentous 
hemagglutinin of Bordetella pertussis which was described above and a cell 
surface adhesin of Haemophilus influenzae. It has been demonstrated that cosmid 
#5 of £. coli J96 encodes an extracellular protein that is -180 kDa and cross- 

20 reactive to polyclonal antisera to the P. mirabilis HpmA hemolysin. Thus, there 

is evidence suggesting there is new member of this family of proteins in 
extraintestinal E. coli isolates. In addition, there is also a hit on the FhaC 
hemolysin-like gene within the PAI V although its statistical significance for the 
sequence thus far available is only 0.0043. 

25 K Regulators. A common regulatory motif in bacteria are the two- 

component (membrane sensor/DNA binding) proteins. In numerous instances in 
pathogenic bacteria, external signals in the environment cause membrane-bound 
protein kinases to phosphorylate a cytoplasmic protein which in turn acts as either 
a negative or positive effector of transcription of large sets of operons. On 

30 cosmid 1 1 representing PAI V were found, in two different Pstl clones, sequences 
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for two-component regulators (similar probabilities for OmpR/ AIGB and 
separately RcsC, probabilities at the 10 " level). 

In addition, the phosphoglycerate transport system {pgtA,pgtC, and pgtP) 
including the pgtB regulator is present in PAI IV. This transport system which 
was originally described in S. typhimurium is not appreciated as a component of 
any pathogenic E. coli genome. The operon had been previously mapped at 49 
minutes, near or within one of the S. typhimurium chromosome specific-loops not 
present in the K-12 genome. It should be noted that the E. coli K-12 glpT gene 
product is similar to pgtP gene product (37% identity), but the E. coli J96 genes 
are clearly homologs to the pgt genes and their linkage within the middle of PAI 
IV element (cosmid #4) is suspicious. 

VI. Mobile genetic elements. There are numerous sequences that 
share similarity to genes found on insertion elements, plasmids and phages. The 
temperate bacteriophage P4 inserts within tRNA loci in the E. coli chromosome. 
The hypothesis was made that PAIs are the result of bacteriophage P4-virulence 
gene recombination events (Blum et at., Infect, lmmun. 62:606-614(1994). Data 
supporting this hypothesis was found during our sequencing with the 
identification of P4-like sequences in each of the PAIs (cosmids 7 and 9). This 
is a very important preliminary result which supports the hypothesis that PAIs can 
be identified by common sequence or genetic elements. However, there are 
indications that multiple mobile genetic elements involved in the evolution of the 
J96 PAIs. Conjugal plasmid-related sequences may also be present at two 
different locations (F factor and RI plasmid). Sequences for multiple transposable 
elements are present that are likely to have originated from different bacterial 
genera (TnlOOO, IS630, IS911, IS100, IS21, IS 1203, IS5376 (B. 
stearothermophflus) and RHS). Of particular interest is IS 100, which was 
originally identified in Yersinia pestis (Fetherston et aL, Mol. Microbiol. 6:2693- 
2704 (1992)). The presence of IS100 is significant because it has been associated 
with the termini of a large chromosomal element encoding pigmentation and 
some aspect of virulence in Y. pestis. This element undergoes spontaneous 
deletions similar to the PAIs from E. coli 536 (Fetherston et aL, Mol Microbiol 
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6:2693-2704 (1992)) and appears to participate in plasmid-chromosome 
rearrangements. This element was not previously known to be in genera outside 
of Yersinia. 

The discovery of the apparent att site for bacteriophage P2 in the PAIs is 
5 interesting. P2 acts as a helper phage for the P4 satellite phage. The P2 att site 

is at 44 min in the K-12 genome. The significance of this hit is unknown at 
present, but may be explained as either a cloning artifact (some K-12 fragments 
in the Pst I library of cosmid 5) or evidence of some curious chromosomal-P4/ 
P2 phage history. It may indicate that the J96 PAIs are composites of multiple 
10 smaller PAIs. 



Example 2: Preparation of PCR Primers and Amplification of DNA 

Various fragments of the sequenced E. coli J96 PAIs, such as those 
disclosed in Tables 1 through 6 can be used, in accordance with the present 
invention, to prepare PCR primers. The PCR primers are preferably at least 1 5 
15 bases, and more preferably at least 1 8 bases in length. When selecting a primer 

sequence, it is preferred that the primer pairs have approximately the same G/C 
ratio, so that melting temperatures are approximately the same. The PCR primers 
are useful during PCR cloning of the ORFs described herein. 



Example 3: Gene expression from DNA Sequences Corresponding to ORFs 

20 A fragment of an E. coli J96 PAI (preferably, a protein-encoding sequence 

provided in Tables 1 through 6) is introduced into an expression vector using 
conventional technology (techniques to transfer cloned sequences into expression 
vectors that direct protein translation in mammalian, yeast, insect or bacterial 
expression systems are well known in the art). Commercially available vectors 

25 and expression systems are available from a variety of suppliers including 

Stratagene (La Jolla, California), Promega (Madison, Wisconsin), and Invitrogen 
(San Diego, California). If desired, to enhance expression and facilitate proper 
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protein folding, the codon context and codon pairing of the sequence may be 
optimized for the particular expression organism, as explained by Hatfield et a/., 
U.S. Pat. No. 5,082,767, which is hereby incorporated by reference. 

The following is provided as one exemplary method to generate 
polypeptide(s) from a cloned ORF of an £. coli J96 PAI whose sequence is 
provided in SEQ ID NOs: 1 through 142. A poly A sequence can be added to the 
construct by, for example, splicing out the poly A sequence from pSG5 
(Stratagene) using Bgll and Sail restriction endonuclease enzymes and 
incorporating it into the mammalian expression vector pXTl (Stratagene) for use 
in eukaryotic expression systems. pXTl contains the LTRs and a portion of the 
gag gene from Moloney Murine Leukemia Virus. The position of the LTRs in 
the construct allow efficient stable transfection. The vector includes the Herpes 
Simplex thymidine kinase promoter and the selectable neomycin gene. The £. 
coli J96 PAI DNA is obtained by PCR from the bacterial vector using 
oligonucleotide primers complementary to the E. coli J96 PAI DNA and 
containing restriction endonuclease sequences for PstI incorporated into the 5' 
primer and Bglll at the 5' end of the corresponding E. coli J96 PAI DNA 3' 
primer, taking care to ensure that the E. coli J96 PAI DNA is positioned such that 
its followed with the poly A sequence. The purified fragment obtained from the 
resulting PCR reaction is digested with PstI, blunt ended with an exonuclease, 
digested with BgHI, purified and ligated to pXTl, now containing a poly A 
sequence and digested Bglll. 

The ligated product is transfected into mouse NIH 3T3 cells using 
Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions 
outlined in the product specification. Positive transfectants are selected after 
growing the transfected cells in 600 ug/ml G418 (Sigma, St. Louis, Missouri). 
The protein is preferably released into the supernatant. However if the protein 
has membrane binding domains, the protein may additionally be retained within 
the cell or expression may be restricted to the cell surface. 

Since it may be necessary to purify and locate the transfected product, 
synthetic 15-mer peptides synthesized from the predicted E. coli J96 PAI DNA 
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sequencc are injected into mice to generate antibody to the polypeptide encoded 
by the E. coli J96 PAI DNA. 

If antibody production is not possible, the E. coli J96 PAI DNA sequence 
is additionally incorporated into eukaryotic expression vectors and expressed as 
a chimeric with, for example, B-globin. Antibody to B-globin is used to purify the 
chimeric. Corresponding protease cleavage sites engineered between the B-globin 
gene and the E. coli J96 PAI DNA are then used to separate the two polypeptide 
fragments from one another after translation. One useful expression vector for 
generating B-globin chimerics is pSG5 (Stratagene). This vector encodes rabbit 
B-globin. Intron II of the rabbit B-globin gene facilitates splicing of the expressed 
transcript, and the polyadenylation signal incorporated into the construct increases 
the level of expression. These techniques as described are well known to those 
skilled in the art of molecular biology. Standard methods are available from the 
technical assistance representatives from Stratagene, Life Technologies, Inc., or 
Promega. Polypeptides may additionally be produced from either construct using 
in vitro translation systems such as In vitro Express™ Translation Kit 
(Stratagene). 

Example 4 

E. coli Expression of an E. coli J96 PAI ORF and protein purification 

An E. coli J96 PAI ORF described in Tables 1 through 6 is selected and 
amplified using PCR oligonucleotide primers designed from the nucleotide 
sequences flanking the selected ORF and/or from portions of the ORF's NH 2 - or 
COOH-terminus. Additional nucleotides containing restriction sites to facilitate 
cloning are added to the 5' and 3' sequences, respectively. 

The restriction sites are selected to be convenient to restriction sites in the 
bacteria] expression vector pQE60. The bacterial expression vector pQE60 is 
used for bacterial expression in this example. (QIAGEN, Inc.. 9259 Eton Avenue, 
Chatsworth, CA, 91311). pQE60 encodes ampicillin antibiotic resistance 
("Ampr") and contains a bacterial origin of replication ("ori"), an IPTG inducible 
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promoter, a ribosome binding site ("RBS"), six codons encoding histidine 
residues that allow affinity purification using nickel-nitrilo-tri-acetic acid ("Ni- 
NTA") affinity resin sold by Q1AGEN, Inc., supra, and suitable single restriction 
enzyme cleavage sites. These elements are arranged such that a DNA fragment 
encoding a polypeptide may be inserted in such as way as to produce that 
polypeptide with the six His residues (i.e., a M 6 X His tag M ) covalently linked to 
the carboxyl terminus of that polypeptide. 

The DNA sequence encoding the desired portion of an E. coli J96 PAI is 
amplified from the deposited cDNA clone using PCR oligonucleotide primers 
which anneal to the amino terminal sequences of the desired portion of the E. coli 
protein and to sequences in the deposited construct 3' to the cDNA coding 
sequence. Additional nucleotides containing restriction sites to facilitate cloning 
in the pQE60 vector are added to the 5' and 3' sequences, respectively. 

The amplified E. coli J96 PAI DNA fragments and the vector pQE60 are 
digested with one or more appropriate restriction enzymes, such as Sail and Xbal, 
and the digested DNAs are then ligated together. Insertion of the E. coli J96 PAI 
DNA into the restricted pQE60 vector places the E. coli J96 PAI protein coding 
region, including its associated stop codon, downstream from the IPTG-inducible 
promoter and in-frame with an initiating AUG. The associated stop codon 
prevents translation of the six histidine codons downstream of the insertion point. 

The ligation mixture is transformed into competent E. coli cells using 
standard procedures such as those described in Sambrook ct al, Molecular 
Cloning: a Laboratory Manual, 2nd Ed, ; Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, NY (1989). E. coli strain M15/rep4, containing multiple 
copies of the plasmid pREP4, which expresses the lac repressor and confers 
kanamycin resistance ("Kanr"), is used in carrying out the illustrative example 
described herein. This strain, which is only one of many that are suitable for 
expressing an E. coli J96 PAI protein, is available commercially from QIAGEN, 
Inc.. supra. Trans formants are identified by their ability to grow on LB plates in 
the presence of ampicillin and kanamycin. Plasmid DNA is isolated from 
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resistant colonies and the identity of the cloned DNA confirmed by restriction 
analysis, PCR and DNA sequencing. 

Clones containing the desired constructs are grown overnight ("O/N") in 
liquid culture in LB media supplemented with both ampicillin (100 ng/ml) and 
5 kanamycin (25 |ig/ml). The O/N culture is used to inoculate a large culture, at a 

dilution of approximately 1 :25 to 1 :250. The cells are grown to an optical density 
at 600 nm ("OD600") of between 0.4 and 0.6. isopropyl-p-D- 
thiogalactopyranoside ( M IPTG M ) is then added to a final concentration of 1 mM 
to induce transcription from the lac repressor sensitive promoter, by inactivating 

1 0 the laci repressor. Cells subsequently are incubated further for 3 to 4 hours. Cells 

then are harvested by centrifugation. 

The cells are then stirred for 3-4 hours at 4°C in 6M guanidine-HCl, pH8. 
The cell debris is removed by centrifbgation, and the supernatant containing the 
E. coli J96 PAI protein is dialyzed against 50 mM Na-acetate buffer pH6, 

1 5 supplemented with 200 mM NaCl. Alternatively, the protein can be successfully 

refolded by dialyzing it against 500 mM NaCl, 20% glycerol, 25 mM Tris/HCl 
pH7.4, containing protease inhibitors. After renaturation the protein can be 
purified by ion exchange, hydrophobic interaction and size exclusion 
chromatography. Alternatively, an affinity chromatography step such as an 

20 antibody column can be used to obtain pure E. coli J96 PAI protein. The purified 

protein is stored at 4°C or frozen at -80 °C. 



Example 5 

Cloning and Expression of an E. coli J96 PAI protein in a Baculovirus 

Expression System 



25 A E. coli J96 PAI ORF described in Tables 1 through 6 is selected and 

amplified as above. The plasmid is digested with appropriate restriction enzymes 
and optionally, can be dephosphorylated using calf intestinal phosphatase, using 
routine procedures known in the art. The DNA is then isolated from a 1% 
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agarose gel using a commercially available kit C'Geneclean" BIO 101 Inc., La 
Jolla, Ca.). This vector DNA is designated herein "VI". 

Fragment Fl and the dephosphorylated plasmid VI are ligated together 
with T4 DNA ligase. E. coli HB101 or other suitable E. coli hosts such as XL-1 
Blue (Stratagene Cloning Systems, La Jolla, CA) cells are transformed with the 
ligation mixture and spread on culture plates. Bacteria are identified that contain 
the plasmid with the E. coli J96 PAI gene by digesting DNA from individual 
colonies using appropriate restriction enzymes and then analyzing the digestion 
product by gel electrophoresis. The sequence of the cloned fragment is confirmed 
by DNA sequencing. This plasmid is designated herein pBac E. coli J96. 

Five jig of the plasmid pBac E. coli J96 is co-transfected with 1.0 \xg of 
a commercially available linearized baculovirus DNA ("BaculoGold™ 
baculovirus DNA", Pharmingen, San Diego, CA.), using the lipofection method 
described by Feigner et al, Proc. Natl. Acad Sci. USA 84:74 13-74 17 (1987). 1 
lig of BaculoGold™ virus DNA and 5 ^g of the plasmid pBac E. coli J96 are 
mixed in a sterile well of a microliter plate containing 50 ^1 of serum-free Grace's 
medium (Life Technologies Inc., Gaithersburg, MD). Afterwards, 10 \i\ 
Lipofectin plus 90 |il Grace's medium are added, mixed and incubated for 1 5 
minutes at room temperature. Then the transfection mixture is added drop-wise 
to Sf9 insect cells (ATCC CRL 171 1) seeded in a 35 mm tissue culture plate with 
1 ml Grace's medium without serum. The plate is rocked back and forth to mix 
the newly added solution. The plate is then incubated for 5 hours at 27 °C. After 
5 hours the transfection solution is removed from the plate and 1 ml of Grace's 
insect medium supplemented with 10% fetal calf serum is added. The plate is put 
back into an incubator and cultivation is continued at 27 °C for four days. 

After four days the supernatant is collected and a plaque assay is. 
performed, as described by Summers and Smith, supra. An agarose gel with 
"Blue Gal" (Life Technologies Inc.) is used to allow easy identification and 
isolation of gal-expressing clones, which produce blue-stained plaques. (A 
detailed description of a "plaque assay" of this type can also be found in the user's 
guide for insect cell culture and baculovirology distributed by Life Technologies 
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Inc, page 9-10). After appropriate incubation, blue stained plaques are picked 
with the tip of a micropipettor (e.g., Eppendorf). The agar containing the 
recombinant viruses is then resuspended in a microcentrifuge tube containing 200 
^1 of Grace's medium and the suspension containing the recombinant baculovirus 
5 is used to infect Sf9 cells seeded in 35 mm dishes. Four days later the 

supernatants of these culture dishes are harvested and then they are stored at 4°C. 
The recombinant virus is called V-E. coli J96. 

To verify the expression of the E. coli gene Sf9 cells are grown in Grace's 
medium supplemented with 10% heat inactivated FBS. The cells are infected 

10 with the recombinant baculovirus V-£. coli J96 at a multiplicity of infection 

("MOI") of about 2. Six hours later the medium is removed and is replaced with 
SF900 II medium minus methionine and cysteine (available from Life 
Technologies Inc.). If radiolabeled proteins are desired, 42 hours later, 5 jiCi of 
35 S-methionine and 5 \xCi 35 S-cysteine (available from Amersham) are added. 

15 The cells are further incubated for 16 hours and then they are harvested by 

centrifiigation. The proteins in the supernatant as well as the intracellular proteins 
are analyzed by SDS-PAGE followed by autoradiography (if radiolabeled). 
Microsequencing of the amino acid sequence of the amino terminus of purified 
protein may be used to determine the amino terminal sequence of the mature 

20 protein and thus the cleavage point and length of the secretary signal peptide. 



Example 6 
Cloning and Expression in Mammalian Cells 



Most of the vectors used for the transient expression of an £. coli J96 PAI 
gene in mammalian cells should carry the SV40 origin of replication. This allows 
25 the replication of the vector to high copy numbers in cells (e.g., COS cells) which 

express the T antigen required for the initiation of viral DNA synthesis. Any 
other mammalian cell line can also be utilized for this purpose. 

A typical mammalian expression vector contains the promoter element, 
which mediates the initiation of transcription of mRNA, the protein coding 
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sequence, and signals required for the termination of transcription and 
polyadenylation of the transcript. Additional elements include enhancers, Kozak 
sequences and intervening sequences flanked by donor and acceptor sites for 
RN A splicing. Highly efficient transcription can be achieved with the early and 
5 late promoters from SV40, the long terminal repeats (LTRS) from Retroviruses, 

e.g., RSV, 1HTLVI, HIVI and the early promoter of the cytomegalovirus (CMV). 
However, cellular elements can also be used (e.g., the human actin promoter). 
Suitable expression vectors for use in practicing the present invention include, for 
example, vectors such as PSVL and PMSG (Pharmacia, Uppsala, Sweden), 
10 pRSVcat (ATCC 37152), pSV2dhfr (ATCC 37146) and pBC12MI (ATCC 

67109). Mammalian host cells that could be used include, human Hela, 293, H9 
and Jurkat cells, mouse NIH3T3 and CI 27 cells, Cos 1, Cos 7 and CV I, quail 
QC1-3 cells, mouse L cells and Chinese hamster ovary (CHO) cells. 

Alternatively, the gene can be expressed in stable cell lines that contain 
5 the gene integrated into a chromosome. The co-transfection with a selectable 

marker such as dhfr, gpt, neomycin, hygromycin allows the identification and 
isolation of the transfected cells. 

The transfected gene can also be amplified to express large amounts of the 
encoded protein. The DHFR (dihydrofolate reductase) marker is useful to 
20 develop cell lines that cany several hundred or even several thousand copies of 

the gene of interest. Another useful selection marker is the enzyme glutamine 
synthase (GS) (Murphy et al y BiochemJ. 227:277-279 (199 1); Bebbington et ai, 
Bio/Technology 10: 169-1 75 (1992)). Using these markers, the mammalian cells 
are grown in selective medium and the cells with the highest resistance are 
25 selected. These cell lines contain the amplified gene(s) integrated into a 

chromosome. Chinese hamster ovary (CHO) and NSO cells are often used for the 
production of proteins. 

The expression vectors pCl and pC4 contain the strong promoter (LTR) 
of the Rous Sarcoma Virus (Cullen et al y Molecular and Cellular Biology, 438- 
30 447 (March, 1985)) plus a fragment of the CMV-enhancer (Boshart et al, Cell 

47:521-530 (1985)). Multiple cloning sites, e.g., with the restriction enzyme 
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cleavage sites BamHI, Xbal and Asp718, facilitate the cloning of the gene of 
interest. The vectors contain in addition the 3' intron, the polyadenylation and 
termination signal of the rat preproinsulin gene. 



Example 6(a): Cloning and Expression in COS Cells 

5 

The expression plasmid, p E. coli J96HA, is made by cloning a cDNA 
encoding E. coli J96 PAI protein into the expression vector pcDNAI/Amp or 
pcDNAIII (which can be obtained from Invitrogen, Inc.). 

The expression vector pcDNAI/amp contains: (1) an E. coli origin of 

1 0 replication effective for propagation in E. coli and other prokaryotic cells; (2) an 

ampicillin resistance gene for selection of plasmid-containing prokaryotic cells; 
(3) an SV40 origin of replication for propagation in eukaryotic cells; (4) a CMV 
promoter, a polylinker, an SV40 intron; (5) several codons encoding a 
hemagglutinin fragment (i.e., an "HA" tag to facilitate purification) followed by 

1 5 a termination codon and polyadenylation signal arranged so that a cDN A can be 

conveniently placed under expression control of the CMV promoter and operably 
linked to the SV40 intron and the polyadenylation signal by means of restriction 
sites in the polylinker. The HA tag corresponds to an epitope derived from the 
influenza hemagglutinin protein described by Wilson el al, Cell 37:161 (1984). 

20 The fusion of the HA tag to the target protein allows easy detection and recovery 

of the recombinant protein with an antibody that recognizes the HA epitope. 
pcDNAIII contains, in addition, the selectable neomycin marker. 

A DNA fragment encoding the E. coli J96 PAI protein is cloned into the 
polylinker region of the vector so that recombinant protein expression is directed 

25 by the CMV promoter. The plasmid construction strategy is as follows. The E. 

coli cDNA of the deposited clone is amplified using primers that contain 
convenient restriction sites, much as described above for construction of vectors 
for expression of E. coli J96 PAI protein in E. coll 

The PCR amplified DNA fragment and the vector, pcDNAI/Amp, are 

30 digested with appropriate restriction enzymes for the chosen primer sequences 
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and then ligated. The ligation mixture is transformed into E. coli strain SURE 
(available from Stratagene Cloning Systems, La Jolla, CA 92037), and the - 
transformed culture is plated on ampicillin media plates which then are incubated 
to allow growth of ampicillin resistant colonies. Plasmid DNA is isolated from 
5 resistant colonies and examined by restriction analysis or other means for the 

presence of the E. coli J96 PAI protein-encoding fragment. 

For expression of recombinant E. coli J96 PAI protein, COS cells are 
transfected with an expression vector, as described above, using DEAE- 
DEXTRAN, as described, for instance, in Sambrook et aL, Molecular Cloning: 
10 a Laboratory Manual, Cold Spring Laboratory Press, Cold Spring Harbor, New 

York (1989). Cells are incubated under conditions for expression of E. coli J96 
PAI protein by the vector. 

Expression of the E, coli J96 PAI - HA fusion protein is detected by 
radiolabeling and immunoprecipitation, using methods described in, for example 
15 Harlow et aL, Antibodies: A Laboratory Manual 2nd Ed ; Cold Spring Harbor 

Laboratory Press, Cold Spring Harbor, New York (1988). To this end, two days 
after transfection, the cells are labeled by incubation in media containing 35 S- 
cysteine for 8 hours. The cells and the media are collected, and the cells are 
washed and the lysed with detergent-containing RIP A buffer: 150 mM NaCl, 1% 
20 NP-40, 0. 1 % SDS, 1 % NP-40, 0.5% DOC, 50 mM TRIS, pH 7.5, as described by 

Wilson et aL cited above. Proteins are precipitated from the cell lysate and from 
the culture media using an HA-specific monoclonal antibody. The precipitated 
proteins then are analyzed by SDS-PAGE and autoradiography. An expression 
product of the expected size is seen in the cell lysate, which is not seen in 
25 negative controls. 

Example 6(b): Cloning and Expression in CHO Cells 

The vector pC4 is used for the expression of an E. coli J96 PAI protein. 
Plasmid pC4 is a derivative of the plasmid pSV2-dhfr (ATCC Acc. No. 37146). 
The plasmid contains the mouse DHFR gene under control of the SV40 early 
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promoter. Chinese hamster ovary- or other cells lacking dihydrofolate activity 
that are transfected with these plasmids can be selected by growing the cells in a 
selective medium (alpha minus MEM, Life Technologies, Inc.) supplemented 
with the chemotherapeutic agent methotrexate. The amplification of the DHFR 
5 genes in cells resistant to methotrexate (MTX) has been well documented (see, 

e.g., Alt, F. W. et a/., 1978, J. Biol Chem. 253:1357-1370, Hamlin, J. L. and Ma, 
C. 1990, Biochim. et Biophys. Acta, 7097:107-143, Page, M. J. and Sydenham, 
M.A. 1991, Biotechnology 9:64-68). Cells grown in increasing concentrations of 
MTX develop resistance to the drug by overproducing the target enzyme, DHFR, 

10 as a result of amplification of the DHFR gene. If a second gene is linked to the 

DHFR gene, it is usually co-amplified and over-expressed. It is known in the art 
that this approach may be used to develop cell lines carrying more than 1,000 
copies of the amplified gene(s). Subsequently, when the methotrexate is 
withdrawn, cell lines are obtained which contain the amplified gene integrated 

15 into one or more chromosome(s) of the host cell. 

Plasmid pC4 contains for expressing the gene of interest the strong 
promoter of the long terminal repeat (LTR) of the Rouse Sarcoma Virus (Cullen, 
et ai, Molecular and Cellular Biology, March 1985:438-447) plus a fragment 
isolated from the enhancer of the immediate early gene of human 

20 cytomegalovirus (CMV) (Boshart et al t Cell 47:521-530 (1985)). Downstream 

of the promoter is BamHI restriction enzyme site that allows the integration of the 
gene. Behind these cloning sites the plasmid contains the 3' intron and 
polyadenylation site of the rat preproinsulin gene. Other high efficiency 
promoters can also be used for the expression, e.g., the human P-actin promoter, 

25 the SV40 early or late promoters or the long terminal repeats from other 

retroviruses, e.g., HIV and HTLVI. Clontech's Tet-Off and Tet-On gene 
expression systems and similar systems can be used to express the E. coli protein 
in a regulated way in mammalian cells (Gossen, M., & Bujard, H. 1992, Proc. 
Natl Acad. ScL USA 89: 5547-5551). For the polyadenylation of the mRNA 

30 other signals, e.g., from the human growth hormone or globin genes can be used 

as well. Stable cell lines carrying a gene of interest integrated into the 
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chromosomes can also be selected upon co-transfection with a selectable marker 
such as gpt, G418 or hygromycin. It is advantageous to use more than one 
selectable marker in the beginning, e.g., G418 plus methotrexate. 

The plasmid pC4 is digested with appropriate restriction enzymes and 
then dephosphorylated using calf intestinal phosphates by procedures known in 
the art. The vector is then isolated from a 1% agarose gel. 

The DNA sequence encoding the complete E. coli J96 PAI protein 

including its leader sequence is amplified using PCR oligonucleotide primers 

corresponding to the 5' and 3' sequences of the gene. 

The amplified fragment is digested with appropriate endonucleases for the 

chosen primers and then purified again on a 1% agarose gel. The isolated 

fragment and the dephosphorylated vector are then ligated with T4 DNA ligase. 

E. coli HB101 or XL-1 Blue cells are then transformed and bacteria are identified 

that contain the fragment inserted into plasmid pC4 using, for instance, restriction 

enzyme analysis. 

Chinese hamster ovary cells lacking an active DHFR gene are used for 
transfection. 5 ng of the expression plasmid pC4 is cotransfected with 0.5 fig of 
the plasmid pSVneo using lipofectin (Feigner et al, supra). The plasmid pSV2- 
neo contains a dominant selectable marker, the neo gene from Tn5 encoding an 
enzyme that confers resistance to a group of antibiotics including G41 8. The cells 
are seeded in alpha minus MEM supplemented with 1 mg/ml G41 8. After 2 days, 
the cells are trypsinized and seeded in hybridoma cloning plates (Greiner, 
Germany) in alpha minus MEM supplemented with 10, 25, or 50 ng/ml of 
methotrexate plus 1 mg/ml G418. After about 10-14 days single clones are 
trypsinized and then seeded in 6-well petri dishes or 10 ml flasks using different 
concentrations of methotrexate (50 nM, 100 nM, 200 nM, 400 nm, 800 nM). 
Clones growing at the highest concentrations of methotrexate are then transferred 
to new 6-well plates containing even higher concentrations of methotrexate 
(1 nM, 2 |aM, 5 |iM, 10 mM, 20 mM). The same procedure is repeated until 
clones are obtained which grow at a concentration of 100 - 200 ^iM. Expression 
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of the desired gene product is analyzed, for instance, by SDS-PAGE and Western 
blot or by reversed phase HPLC analysis. 

Example 7 

Production of an Antibody to an E. coli J96 Pathogenicity Island Protein 

5 Substantially pure E. coli J96 PA1 protein or polypeptide is isolated from 

the transfected or transformed cells described above using an art-known method. 
The protein can also be chemically synthesized. Concentration of protein in the 
final preparation is adjusted, for example, by concentration on an Amicon filter 
device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody 

10 to the protein can then be prepared as follows: 

Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes of any of the peptides identified and 
isolated as described can be prepared from murine hybridomas according to the 
classical method of Kohler and Milstein, Nature 256:495 (1975) or modifications 

15 of the methods thereof. Briefly, a mouse is repetitively inoculated with a few 

micrograms of the selected protein over a period of a few weeks. The mouse is 
then sacrificed, and the antibody producing cells of the spleen isolated. The 
spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, 
and the excess unfused cells destroyed by growth of the system on selective 

20 media comprising aminopterin (HAT media). The successfully fused cells are 

diluted and aliquots of the dilution placed in wells of a microtiter plate where 
growth of the culture is continued. Antibody-producing clones are identified by 
detection of antibody in the supernatant fluid of the wells by immunoassay 
procedures, such as ELISA, as originally described by Engvall, E., Meth. 

25 Enzymol. 70:419 (1980), and modified methods thereof. Selected positive clones 

can be expanded and their monoclonal antibody product harvested for use. 
Detailed procedures for monoclonal antibody production are described in Davis, 
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L. et al Basic Methods in Molecular Biology Elsevier, New York. Section 21-2 
(1989). 

Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to heterogenous epitopes of 
a single protein can be prepared by immunizing suitable animals with the 
expressed protein described above, which can be unmodified or modified to 
enhance immunogenicity. Effective polyclonal antibody production is affected 
by many factors related both to the antigen and the host species. For example, 
small molecules tend to be less immunogenic than other molecules and may 
require the use of carriers and adjuvant. Also, host animals vary in response to 
site of inoculations and dose, with both inadequate or excessive doses of antigen 
resulting in low titer antisera. Small doses (ng level) of antigen administered at 
multiple intradermal sites appears to be most reliable. An effective immunization 
protocol for rabbits can be found in Vaitukaitis, J. et al, J, Clin. Endocrinol 
Metab. 53:988-991 (1971). 

Booster injections can be given at regular intervals, and antiserum 
harvested when antibody titer thereof, as determined semi-quantitatively, for 
example, by double immunodiffusion in agar against known concentrations of the 
antigen, begins to fall {See Ouchterlony, O. et al, Chap. 19 in: Handbook of 
Experimental Immunology, Wier, D., ed, Blackwell (1973)). Plateau 
concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum 
(about 12 ^M). Affinity of the antisera for the antigen is determined by preparing 
competitive binding curves, as described, for example, by Fisher, D., Chap. 42 in: 
Manual of Clinical Immunology, 2nd ed., Rose and Friedman, (eds.), Amer. Soc. 
For Microbio., Washington, D.C. (1980). 

Antibody preparations prepared according to either protocol are useful in 
quantitative immunoassays which determine concentrations of antigen-bearing 
substances in biological samples; they are also used semi-quantitatively or 
qualitatively to identify the presence of antigen in a biological sample. 
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While the present invention has been described in some detail for 
purposes of clarity and understanding, one skilled in the art will appreciate that 
various changes in form and detail can be made without departing from the true 
scope of the invention. 

5 All patents, patent applications and publications recited herein are hereby 

incorporated by reference. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANTS: Human Genome Sciences, Inc. 

94 10 Key West Avenue 
Rockville, Maryland 20850 
United States of America 

University of Wisconsin 
1300 University Avenue 
Madison, Wisconsin 53706 
United States of America 

APPLICANTS / INVENTORS : Dillon, Patrick J. 

Choi, Gil H. 
Welch, Rodney A. 

(ii) TITLE OF INVENTION: Nucleotide Sequence of Escherichia coli 

Pathogenicity Islands 

(iii) NUMBER OF SEQUENCES: 142 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Sterne, Kessler, Goldstein & Fox P.L.L.C. 

(B) STREET: 1100 New York Ave., N.W., Suite 600 

(C) CITY: Washington 

(D) STATE: DC 

(E) COUNTRY: USA 

(F) ZIP: 20005-3934 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3.50 inch, 1.4Mb storage 

(B) COMPUTER: HP Vectra 486/33 

(C) OPERATING SYSTEM: MSDOS version 6.2 

(D) SOFTWARE: ASCII Text 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: To be assigned 

(B) FILING DATE: Herewith 

( C ) CLASSI FICATION : 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/031,626 AND US 60/061,953 

(B) FILING DATE: 22-NOV-1996 AND 14-OCT-1997 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Steffe, Eric K. 

(B) REGISTRATION NUMBER: 36,688 

(C) REFERENCE /DOCKET NUMBER: 14 8 8. 07 4 PC02 /EKS /CBM 

(vi) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (202) 371-2600 

(B) TELEFAX: (202) 371-2540 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1178 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
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(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



CNTANATTAG 


GCCTGCTNAA 


TGTATTTATA 


TCTAAAAAAA 


TTCGCATCCA 


AAAGGAATCC 


60 


AATCTGTACT 


GTTTTTTCTT 


GTGCTGACAT 


CTTCTTTTCC 


CTGGCTGGTA 


TGGCAAGTGA 


120 


CGGAGACAAG 


AGAAACGTTT 


TAAGCTCAGT 


TATCTCCGCC 


ATCACTTTCC 


ACGAATGACA 


180 


AGTAATTTTG 


CCTATTTTAA 


AACCATGCAA 


AAGGCAGGGT 


AAAAGGAGAA 


AATTCGATCG 


240 


AATCGATCGA 


CAAAATCGAT 


CATACATGAT 


GAAGATTTCT 


TATCGAATCC 


ATAAAAATAG 


300 


TGACAGCTAA 


CCGGCGTTGC 


AGGAACAGTC 


AGAAATGGGC 


GTTTGGGAAA 


GAGCCATAGC 


360 


ATACGTCGTC 


GCTGACATAG 


AGGAACTGTG 


CTTTGTTGAT 


AAGATCCTTT 


ATACGGCAAC 


420 


CAATCCACTG 


GACAAAAGAT 


GAACTACGTA 


ATCACCGGGT 


TCTCACTGAC 


GAAATACAGA 


480 


AGTTAATGAC 


ACAACTGTGC 


CATGCACCTT 


GTACAACAGC 


GGTGGAAAGC 


TCTCAGAACA 


540 


ATGGAATTGC 


AGAAAGGTGT 


TAAAACGATG 


AAAGCCTTCA 


TACCCAAATC 


GAATGTAAGA 


600 


ACGGCAGTAA 


AGACTGAATT 


GCGTAACCTT 


GCAGTAGCTC 


GAGTATTACA 


CTGCATAGTG 


660 


1 L)L.ALj(j><o i 1 /\ 






GGCGCCAGCG 


AATAACGTCA 


CCTTAGATGT 


720 


AGCAGTTGCC 


AAATAGTGAC 


TCAAGGGCGG 


GCTTACCGCA 


TACACTGACA 


CTTAGCGGAT 


780 


CGACAGAATA 


TTATTAGCAG 


ATCATCACTG 


AACGCTACGT 


AATTATCGTA 


ATAAAGGCTT 


840 


TTTCTGGCTA 


CCAGGAAGAC 


CTGACATGGC 


TCTGCTCTGG 


AACCAGGCCG 


CAGGAAGCAT 


900 


CAATCTGGAG 


TTTATCAGCT 


ACTGGAATTC 


CGGTGTATTG 


GCAGCCCCTG 


ATAATCACCT 


960 


GACCCACGAA 


GAGCGCTCTG 


CTTTGCAGAA 


ACTCTGGGGC 


GGTTTGGAGA 


CAGGAGATGT 


1020 


AACGATTATA 


GGACGTTCTG 


ATGAAGTCCA 


TGATTTTACC 


TCCGCCTTAA 


TTAACTGTTT 


1080 


TCTTTCTGAA 


GAAGAAATTG 


TCTGGTGGCA 


ATCAGGTGGC 


ATTTTCCCGG 


ATCCTTGGCC 


1140 


CGCTAATATA 


TCCCGGCTGA 


ACTGACGATT 


AACGCGAT 






1178 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 414 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
ATCCTATTCA TTTTGCCATG ACGGGCGAAC TCCAGATAAA GGTTTTGAAA GTAATGAGAA 60 

ATTATTAATT CATCCATGTT ACTGGCTTGG TTTGAATCTA AATCGTAATG CACTTGCTCC 120 * 

AGAGGAAGCA GAGGAGATAA ATGACGAATA TGATATTAAT ATTATTTCAG ATAATTCAGC 180 
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CAT TAG AAA T AAAACAATAG G T C AAA T AA C TACTCATCTA GATCAGATAC CG AT AGG AAA 240 

TGAAGGTGCC ACTGAATTTG AACAATGGTG TTTAGACGCA CTAAGAATAG TATTTGCATC 300 

CCACCTAACA GACATCAAGT CCCATCCAAA TGGTAACGCA GTTCAGAGAC GAGATATTAT 3 60 

AGGCACCAAT GGTGGCAAAT CTGAWTTTTG GRAACGAGTA TTGGAGGACT ATAA 4 14 
(2) INFORMATION FOR SEQ ID NO: 3: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8752 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

TTGGGATCTG GTACANTCCA CCCAGCGGCA TTATCCNGAA GGCAATATTT TTAAGGATTA 60 

TTCGTCCACA AAATCAGTAC TGGAACCAGG CTCAAAAAAG GCTTTAACGT GACCTGCTNC 120 

CATCTACAGT AGATGTACAA CCTGTTAAGT TAATTGAAAA TGGTGTTAAT CCGGTTGTTT 180 

CTCCAGGGGT AGCAAGGGCC TTATTCGATA CAGTGGGTAA TGTTACTGTA AAATTACCAT 240 

CATTCCCTGT GGTCACATTG CAGGTCTGAG CTACAACTTT GCCTGTAAAC GTAATTGTTC 3 00 

CGTCATAGGC CATAGCTGAA CCAACAAACA CAGCAGAAAC AAATGTAGCC AATGCTATAA 3 60 

CTTTTATTTT CATAAAATGA ATTCCTGTTT AATTCCGGTA TTGATCATTT GTTCAGCAAT 4 20 

CATCCCCAAC AAAACAATCA TTTTCAAAAT GTTTTTACCG ATCGATAACC AGCACATGAT 4 80 

AGATTGCACC TATCATGATT GCTAAAACGA TCGGGAAAAG CGATCAAAAA CCATATTTAT 54 0 

TGTGTTGGTA ATGACAAAAG ATATGCTTTA CCCTGAAATG AGCGACCTAT TCATGAAAAT 600 

ATGTAGGTCT GTATTTGATT ACTATCATTG CTATATTTCC ACTATCCAAT TTATATTTCA 6 60 

TGATTAAAAT ATACCTTTTT ACACTATTAT TTATTTGTTG CAGCTTGCCT GGCTTTATCT 7 20 

TATTCCGACT ATTTTATGGT AGATACAGAA TACAATTAAT TAAACTTATT TAAAGATTTT 7 80 

ATAAATACCA TATTGGAGTT GACCGATAGA TACCTACTAA CAAGAGCAAT CACCACCACC 84 0 

CCATGAGGTG TTTAGGAATA CAATCAATAA ACAACATCCA TGCCCGGCGA CGTACATACC 900 

TGTTTGCTAT GATATCTGTT ACGCTACGCT TGCTAATTTA CTGAAACTCA GCATCTGTCG 960 

ACGGAGATTC GTCCGGGCCC TGATACAACA AGGGCAAGAA AACCACCCGA AATACAGATA 1020 

TTCTTATAAA AATGGATCAT ATTTCCATGT GCAAGTTCAG CTGGCATCGT CCAGAATGCG 1080 

TGTCCAAGAA ATGAAGCAAA CACGGTATAC AGGCACAGAA TAATGCTCAC TGGCCGGGTG 114 0 

AAAAAGCCRA AAACAATCAT TAATGCTCCA ACGATTTCGA CAAGGACCAC TATTGCTGCA 1200 

GTAATCGCCG GAAATATAAG CCCAAGAGAG GCCATTTTAT CGATAGTGCC AGTGAATGAT 12 60 

AGCAGCTTGG GAACGCCGGA TATCATATAA AGGCATGCCA GCATCAGACG GGCAAGGAGC 1320 
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AACAATGCCG ACGTGTAATT TCCCATATTA AAATACCTGA TTTTATCCAC TATCAATGCT 138 0 

CAGTCTCCTT GTTTCTGATA AAGCCCTGAG CCAAATCCTT AAGTGTACGA GGACCACTCA 14 40 

GTAACATTGC CGTCCTCAGC TCCGTCTTCA GGTGCTCAAT GACACTGGCA ACGCCCCCGA 1500 

CACCACCTGC TGCGATGCCA TAAAGAACAG GACGTCCGAC CGCAACAGCC GTTGCCCCAA 15 60 

GAGAGATAGC CCTTACAACA TCAACCCCCC TGCGAATACC GCTGTCAAAA ATGACCGGAA 162 0 

CTTTGTGCCC GACTCTTGCA GCAACTTCCT GCAACTGGCT GATGGCAGAA GGAACACCAT 1680 

CAATCTGGCG ACCACCATGA TTAGACACCT GGATGGCATC TGCTCCTGCA TCAATGGCGA 17 40 

CGACTGCATC CTCACCTCTG AGGATGCCCT TGACAATGAC TGGCAGCCCG GTGATTTTTT 1800 

TTACAAACTC AATATCAGCC GGGGTCAGCT CAACTTTTTG GTTAAAAAAA TCACCTTTGC 18 60 

CACCGTAACG GGGGTCATGA TTACCGAACG TCGCTCCTGC AGGGAAAGGC GAGCTCATGC 192 0 

TGAGAAAAGC ATCAGTTGTC CCGGGACCAA GCGCATCCGC TGTGATAATA ATGGCTGAAT 1980 

AGCCTGCCGC TTTTGCACGC TCCAGTAAAC TTCGGGTCAC ACCAGCATCC GCGTTAAAAT 204 0 

ACAGCTGGAA CCATTTAGGT CCTTTACTGG CTTTTGCAAT ATCCTCCAGA GAGCGGTTGG 2100 

ATGCCCCTGA TGATTCATAA AGTGCCCCGG CCTTTTCTGC ACCCGCTGCA GGAATCACCT 2160 

CCCCTTCCGG ATGGACGAAC ATATGCGCGC CCATAGGTGC TATCAGCAGG GGATGTTCCA 2220 

GATGATGGCC CAAAAGGTCA GTCCGGATAT CAATGCTGTG GGCAGCAAGT CCAGTGAGTC 2280 

GGTGAGGTAA CAAAGGATAA TCACTGAANT GCCTGCGGTT CTCATGATAC GTCCACTCAT 234 0 

CTCCAGCACC ATGAGCAATA TATGCATACG CAGCTTCCGT CATCACATCT TTTGCTGAAG 2 4 00 

TCTYCAGTCT GTCCAGACTG ATGATATGAA GAGATTTGCT GGTCGATGTA TCAGCATGTC 24 60 

CAGACGTTTT AGTGATGATA TGTGCCGTTG AAGATGAGAT ATTTTTGGCA AGGGCCGGCG 2 520 

CAGTTGACAG CCTGCGGCAG ATATTCCTAA AACGGCATTC TGAATAAAAT TACGTCGGGA 2580 

AAGAGGCATA ATAAGCTCCA TATATTATAA ATAAGCCAGG TCTCCCTGGC TTATAATGAT 2 64 0 

CATGCCACGC CCTGAAGCGG GTTGGTGTTG AAGGTATAAA GGAAAATTTT CCATTCACCA 2700 

TTAATTTTAC TGAGGACAAA AACTTCACGG TTCAGGTCAA TAATGGTTTT CTGCTCTTTA 27 60 

AAGTTCGTTA CAACAGAACC CACATGGTGG TGAGTGCGGA CAACCGCGGT ATCTCCGTTG 2 8 20 

ATCCAGATAG AGTCAAACGC AAAATCGGTC TCAAACTTTT CACGCTTGAA CAGATCATCG 2880 

TACTGCCGCT GGCGTTTTTC TGTATTGTCA GCCGTCAACT TATCATTCCA CTGGGAATAA 2 94 0 

CTTTCATCAG CAAACAGGCC CAGGATGGTT TTTGTATCCC CGGCATTCAG TGCGTTCTGA 3000 

TACTTGATTA TCGTGTCATA CACGTTCTTC TGCTCAGTAG CAATCTTACT GTCTGTGGAG 30 60 

TATTTGAATG TACCGCCGGA TTGTTCAGGT GAGCTTTCCT TCTGTGCTGT CGACGATGAG 3120 

GCAGCCAGAG CATTAGAGCC GAAAAGAAGG GATGATGCCA TGACTGCTGT TGCTATAAAA 318 0 

TGTTTCATAT ATTCTGCATC AGTTCTTCTG GGGATCTGTG GGCAGCATAT AGCGCTCATA 32 4 0 
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CTATGCTGCT GTTTCAATAT TAGCGGCAGA CGTCAGCCTT ACCGCACTAC TTATTGGATA 3300 

AGAATATCAA AAGTGACCGT GAAGTCAATT TTATCACAAC ACAGAAGGCC ACTATTTATG 3 3 60 

CGCAGAAAAT ATGAATCGTC CTCATCATGC ACGAAAGACT CGTAGTTGCA GGGCGGAAAA 34 20 

AACTGCGAGG AGACGACAGC AGATAGCCCG GGGAGCAGTT GAGGAGTTCT CTGCACAAGG 34 80 

GTTCGCTCGC GCCACATNCA GCAATATGAG CAAGCGCGCA GGAGTAGCTA AAGGCACGGT 354 0 

ATATAACTAC TTCCCAACAA AGGAATTATT GTTTGAAGCG GTTCTGAAGG AGTTCATTGG 3 600 

TACCGTCGGT ACTGAACTGG AATCTTCCCC CCGCCGCAAC GGGGNAAACC GTAAAAGCCT 3660 

ATCTGTTGAG AGTGATGTTA CCTGCGGTCA GGAAAATTGA CGAC GCATCA AGAGGCAGAG 3720 

CCAGAATAGC CCACCTGGTT ATGACAGAAG GGAGCCGGTT CCCGGTAATC GGTCAGGCTT 37 8 0 

ATTTACGGGA AATACATCAG CCACTACAGC AAGCCATGAC CCAACTGATT CAGGAAGCAG 38 4 0 

CATCAGCCGG AGAGTTAAAA GCAGAGCAAC TGCTCTGCKT CCCGTGTTTA TTGCTGGCTC 3 900 

CAAACTGGTT TGGCATGGTG TATAACGAAT TCTGAACGCG GCAGCACCGG TCAGTACAGG 3 960 

CGATCTTTTT GAAGCCGGAA TTGGTGCTTT TTTCCGATAG ACACATAACT GTCAGTATTA 4 020 

TGACCATGCC GTCAGGAGGA GGTATACCAG TGATACCCTG CCATGACCCG GTAACGTCTC 4 080 

CTGGCTGCCT TAAACCTGAA AGACCTGGCC CCACCACACT GCCGGTTACG CATCAAGATG 414 0 

CAGCAACCCT TGCATAAGGC TGTTTTGTGC AGAGGGCTAC CGGAAAGATA ATAACGTCAC 4 2 00 

AGCCCGTATG CAT C AG AT AA AACAGTGTAT TTTATCTGTC AGCAGTCACT GGAGCGGATT 4 2 60 

GTGGGGCGAG ATTCAGGTGC TGATACTGTA ACGACTCTGC GCCGCTGCTG CGGTAAAAGC 4 320 

GGCTGCCACC AGGCACGGTT ATCAGAGGAG GATGACCGTG TCCGCCCCTG GTGGTGATGA 4 380 

ACTCTCCATC ACAATCAATA ATGCCGCCGG GTGGATGAAG CAGACAGGGA TGGCAAGTCC 4 4 40 

CACTATCCCG GATAAAATGG GCTCTGGGCG CTCAGAAGAC CTGTGTGTCA GGCAGGGGTG 4 500 

AGAACGGTGA TGTTTTTTGT TGTCTGAAAG TCCAGCTCCA GCATTGCCTG CCAGCCTCAA 4 5 60 

GACTTCCGCT TTCTGCCCTT TCCGGCATTT TCTTCCGTTA CCATCATTCT GTTAATTCAG 4 620 

AGGCGTAGTA GTAGTAAACG TAATACATAT CCGGGAGGAT GAAGTCATCT AATCCTGCTC 4 680 

CCCGAATATC ATACAGCCAT TCCTGAGTGT GACTGCACCA TTTCCAATTA TGCAGTCTGT 4 74 0 

CCTCATCACA AAAATGTTGC AAGCAGTGCG GAGTCACGTT CCGTATTCAT GCCCTCTGCC 4 800 

AGATATTGAG CGGGGGAGAA ATGTGTAAGC GTCAACAGAG CGCCGTATTG ACACTTATTT 4 8 60 

ATCGGTGAAA ACTACGTTCC ATGGCAGCAG TTCGTCAACA CGGTTGGAGG GCCATTCCGG 4 920 

CAGTACGCTC AGGATATGGC GCAGATACGC TTCTGGATCG ATACCGTTCA ACCGACAGCT 4 980 

CCCGATTAGT CCGTACAGCA GAGCTCCGCG CTCGCCTCCA TGATCGTTGC CGAAGAACAT 504 0 

GTAATTCTTT TTCCCGAGAC AGACGGCACG AAGCGCTCTT TCTGCTGTGT TATTGTCCGC 5100 

CTCCGCCAGA CCGTCATCAC TGTAATAACA GAGGGCGTCC CACTGATTCA GGACATAGCT 5160 
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GAACGCTTSR CCCAGTCTGG ATTTTTTCGA CAACGTGCCA TTCTTCTCCA CCATCCATTC 5220 

ATGCAGCGAC GTCAGTAACG CTTTGCTTCG CTGCTGCCTG GCTGCAAGAC GTTCAGACTC 52 8 0 

CGGTAAGCCC CGTATTTCAT CMTCAATGGC GTACAGTTCA CTGATGCGCT TCAGAGCTTC 534 0 

TTCTGCCGTC GTACTTTTGC TGCTGATGTA TACATCGTGG ATTTTTCGCC GGGCATGGGC 54 00 

CCAGCACGCA ACTTCTGTCA GTGCACCACC TTCACGTTCG GCACTGAACA GCCGATCGTA 54 60 

ACCGCTGAAT GCATCCGCCT GCAGGATACC CCGGAAGGGA CGAAGGTGTT GTACCGGATG 5 52 0 

TTTTCCCTGC CTGTCTGGTG AGTAGGCGAA CCAGACCSCC GGTGGCTCTG ATGAGCCCGC 5580 

ATTCCGGTGA TCCCSGACAT ACGTCCAGAT GCGTCCTGTT TTTGCCTTTT TTCTGCCCGG 5 64 0 

TGCCAGCACT TTTACTGGTA TGTCGTCAGT GTGAACCTTG CGGGTGTTCA TCACGTAACG 5700 

GTAGAGGGCA TCATTCAGCG GAGTCATTAA CTGGCAGCAC GCGTCAACCC AGTTGGAGAG 57 6 0 

TAATGCACGG CTCAGTTCGG CACCCTGTCG GGCAAAGATT TCACTCTGAC GATACAGTGG 5 8 20 

CAGGTGTTCG CAGTATTTTC CCGTTAACAC GCGGGCAAGT AATCCGGAGC CCGCGATGCC 58 8 0 

GCGCTCTATC GGGCGGGACG GCGCTGGCGC TTCAACTATA CAGTCACATT TTGTACAGGC 594 0 

TTTTTTTACC CGAACAGTGC GGATCACTTT CAGGGCGCTA CTCAGCAGTT CCAGCTGCTC 6000 

AGCACTAACT TCACCCAGAT AATCCAGCTC ACTGCCACAC TCCGGGCAAC AACTTTCTTC 60 60 

AGGCTCCAGG CGGTGTATTT CACGGGGAAG ATGTGCTGGT AACGGACGAC GATGACGTGA 6120 

TTGTCGCAAC TGGCGGGGAA CTGCGGGTCA TCCTCACGCC CACTGTAACG ATCGCTTTCC 6180 

TGTTCGCGTT GTTTCAGTTG GGCCTCAGCC TGTTCAACCT CACGCTGCAG TTTTTCAGAA 624 0 

CGGGTACCGA ACAGCATCCG GCGCAGTTTT TCTATCTGGG CCCTCAGATG TTCTATTTCC 6300 

CGCTCCTCCT CTTCGATCTT TTCTTCGGCA CGTGCCARTG CAGAGCGCAG GAAGGCCTCC 63 60 
GTCTCTTCAA CCAGACTCAG TTGCTGATCT TTCTGACGGA GGGCTTCAGC CTGCTCAGAG 6420 
AGTAGCCTTT CCAGCTCAGT GATACGAATG AGGTATTTCC GACTCATGAC CGTTTTTATA 64 8 0 

ATCCGGCCAT GACATTTTTA CAACATTGTC AGTGCATTAA GGCGGGATGT TTTGGGTTGA 654 0 

CGCCAGTCCA GTTTATCGAG GAGCATTGCC AGCTGCGAGC GGGTAATGGA TACCTTACCG 6 600 

TCACGCACCG CAGNCCAGAT AAACTGGCCT TCCTCCAGAC GTTTGGTGAA CAGGCACAGA 6660 
CCATCAGCAT CAGCCCACAG GATTTTAATC GTGTCACCCC GTCGGCCGCG AAAGATAAAG 6720 
AGGTGACCGG AGAAGGGGTT CTCATCCAGC ACATGTTGTA CCTGTTCACC CAGACCGTTG 6780 
AAGGATTTAC GCATATCAGT AACGCCGGCA ACCAGCCAGA TTCGAGTGTC TGATGGGAGC 68 4 0 

GAGATCATCG TCCTCTCCCG GTCAGTTCAC GGATCAACAC CGTGAGCAGC TCTGGTGAAG 6900 
GATTTTCCAG CGTCATGTTA CCGTGGCGGA ACTCAAGTTT ACAGGAACTG GCACTGACTG 69 60 

TGCTTTGTGA AGGAGTGGAT AAAAGCGGAG TAAGAGCCGC CATAGGCTCT TTCTGCTCAT 7020 
CAGGCGTTAT CTCAACAGGT AATAATTCAA CGCCAGCGCC AGAAGAGGTT GTTACCGGAA 7 080 
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GACGCCGCGA TATACGCCCT TCGTTCTGCC AGAGCCTGAG CCATTTGAAC AGGAGGTTAT 7 1 4 Q 

GATTGATATC GTGTTCGCTG GCAATACGGG CAAGAGAGGC TCCTGGTTGT GAAGCCAGTT 7200 

TAACGATTTG AAGTTTAAAC TCATTTGAAA ATGTTCTGCA GG3TTCTGCG GATAATATTT 7260 

TCTGTTGCAT AACAGGTGTC GACTAGTTGA AAAAGTGGGC ACCTACGTTA CGAATACTGG 7320 

CTTAATGGCT ACATACGGCG GTCAGTTTAC GCTTACAGAA ATGTAATGAA CACGTCCTAC 7 380 

CATTAACTGA AGAGCATGGT GACGGATGAA GGAAAAAGCA GGAGTGTGTG GTGGCTCACA 74 4 0 

GATTTCCGAC ATCATAGCTG TCAACGACGG ATGAAAAGCG GCTCTTCCGC AACTTGGGTG 7 500 

GAAGAAAATG GATGAAACTT TCTGGTGTGA GAACCTTAAG GAAACAACAT GTTGGGTGGA 7 5 60 

GCGGACAATC CAAATGGTGA ATTACCGTCT TATATCACTG GCGCTGACAT TCCGGGCGTC 7 62 0 

TTCTCCGCCA CAACGCCATT TGCAGTGCAT CACAGGCCAG TTGTGCTGTC ATTCGCGGTG 7 68 0 

ACATCGACCA GCCAATAACG GCGCGTGACC ACAGGTCGAT GACTACTGCG AGATACAACC 774 0 

AGCCCTCATC GGTACGCAAG TAMGTGATGT CACCCGCCCA MTTGTGGTTC GGAGCGTGGC 7800 

GCTGAAGTTC CTGCTCCAGC AGATTCTCCA ATACGGGCAG GCCATGTGCA CGGTAGCTGA 7 8 60 

CCGGGCTGAA CTTCCGGCTG CTTTCGCCCG CAGCCCCTGA CGACGCAGGC TGGCGGCAAT 7 92 0 

GGTTTTAATA TTGAACTCCG GCATTTCGTC AGCAAGGCGG GGAGCACCGT ATCGCTGCTT 7 98 0 

TGCCTCAATG AATGCCTTAT GGACAGCGGC ATCGCAGGTG AGCCGAAACT GTTGGCGCAG 8 04 0 

GCTCATCTGG TGACGACGCC TGAGCCAGAC ATACCAGCCG CTGCGGGCAA CCCGAAGTAC 8100 

ACGACACATC GCTTTGATGC TGAACTCTGC CCGATGATTT TCGATGAAGA CATACTTCAT 8160 

TTCAGGCGCT TCGCGAAGTA TGTCGCGGCC TTTTGGAGGA TGGCCAGTTC CTCAGCCTGC 822 0 

TCCGCCAGTT GTCGTTTAAG GCGGACATTT TCAGCGGCCA GTTCGCTTTC GCGGTCTGAC 8280 

GAACTCATTT GTTGCTGCTG TTTACTGCGC CAGGCATAAA GCTGAGATTC ATACAGGGTG 8 34 0 

AGTTCACGGG CTGCGGCGGC CACACCGATG CGTTCAGCGA GTTTCAGGGC TTCGTTACGA 8400 

AATTCAGGCG TATGTTGTTT ACGGGGCTTC TTGCTGATTG ATACTGGTTT TGTCATGAGT 8 4 60 

CACCTCTGGT TGAGAGTTTA CTCACTTAGT CCTGTGTCCA CTATTGGTGG GTAAGATCAC 8520 

TCAGCAACGT ATCAAAAGTC TGTAAAATCA TGGGCGTTTC GCGTGATACA TTTTATCGTT 8 58 0 

ACCGCGAACT GGTCGATGAA GGCGGTGTGG ATGCGCTGAT TAATCGTAGT GCGGCGCTCC 8 64 0 

TAACCTTAAG AACGTACCGA TGAGGCAACT GAACAGGCTG TTGTTGATTA CGCCGTCGGT 8 700 

TTCCCGGCAC ACGGTCAGCA CCGGACCAGC AAACAAGCTG CGTAAACAGG GC 87 5 2 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2417 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



TGGTCAAAGA 


TGCAACTGCA 


TTTCGTCGCG 


GCTTTGCGGC 


AAATACTTAC 


ATCGCAGAAA 


60 


TACTGTGCGG 


AAATCTGCAT 


CCATTTCCAC 


TTGCTGTATG 


GCATAACTTT 


TCAGGCGGTC 


120 


CGGATACTGC 


CGAAGATTAT 


TATGCCACAT 


ACCACCCGTT 


ATGGGGGCAA 


TATCCGGAAG 


180 


CATTGCTGTT 


TGTAAACTGG 


CTCTATAATC 


ATTCCTCTGT 


GCTGCATGAA 


CGGGCAGAAA 


240 


TCATTAAATG 


CGCCGAAATG 


CTGATGCAGG 


AAGATGATTT 


CGAAATATGC 


GAAAGTATTT 


300 


TAAGACAGCA 


GGAGAAGTTG 


CGTGAAAGAA 


TTGATGAGAC 


GCTTTCTGAG 


AAAATTGTAC 


360 


AGAAATGCAG 


AAATATGAAT 


GGTGAATATG 


TCTGGCCCTG 


GATATTGCCG 


TTTTCAGCGG 


420 


CAGGCATGAA 


ACATACTGGC 


ATACAGTATC 


AGTAGATATT 


GCATTAGTGT 


ATCCTGCACA 


480 


CAAGTAATAA 


TTTATCCACC 


AATAATAACA 


CTGTTAATGT 


CCCCTTCCCC 


TGGTTGTCAG 


540 


CCAGGGGTTA 


TCTTCTGAAT 


ATTTCTTTTG 


AAAAGGATAA 


CACAATAAAT 


TATTTTTATG 


600 


AATTATCCCA 


TGGACTCATT 


AACACCCTTT 


CATAATGTTT 


TATTGTCAAA 


CACGTTATGG 


660 


CTGACATCAA 


AAAAAACCGG 


ATTTCCTCTG 


CCAGCGGGTA 


ATCACCTCCC 


CGGTGTTTTC 


720 


GGTTGGTCTG 


GTTACTCCTG 


TCTGGTTATT 


AGCAAGATAA 


TTGCTATAAA 


CAGTGGAAAA 


780 


CTCATCGTAC 


ATAATCTGGT 


GATGAACATT 


ACGCTTATTT 


TCCCTTGACC 


GGAAGAATCA 


840 


GAGGCTGCGG 


TTTCAGACTG 


TCTGCCGGTA 


CATTCCTCTC 


TCCGTTAAAA 


ACCATAATGG 


900 


GTTCATTATC 


TTCGTCTGTC 


AGTAGATTGA 


ATGGCGGTAT 


ATTTTCAGTA 


CGAATGCCGG 


960 


TCAGCCACTG 


AAAAATACCT 


GCGAAATGAC 


GGGCACTGAT 


TTTTCTGCTG 


ACGGACTGAT 


1020 


GAGACGTGAT 


GTCACTGGCG 


GTAATAATCA 


GGGGAACGCT 


GTAGCCTCCC 


TGCACATGAC 


1080 


CATCATGATG 


AACAGGATTA 


GCACTGTCGC 


TGACCGACAG 


CCCATGGTCA 


GAAAAGTAAA 


1140 


GCATGACGAA 


ATGACGGGAA 


TGCCGGCGAN 


GGATACCATC 


AAGCTGACCG 


AGAAAGTTAT 


1200 


CCAGTTTACT 


GATGCTGGCG 


AGGTAACAGG 


CAACCTTTCG 


GGGATACTGC 


TCCAGGTAAT 


1260 


GATTCGGCCA 


GGAGTGAAGC 


CGGTCACACG 


GGTTCGGATG 


AGACCCCATC 


ATGTGCAGGA 


1320 






TT AT CCGCC A 


GCGCACGTTC 


TGTTTCCTGT 


AACAACAACA 


1380 


TGTCATCCGT 


TTTACGGGAA 


GCGAATGCSC 


TTTCTTGAGG 


AAAACGGTAT 


GCTCCGCATC 


1440 


AGAAGCAATA 


ACAGAGATGC 


GTGTGTCATG 


CTCTCCCAGT 


TTTCCCTGAT 


TGGATATCCA 


1500 


CCATGTGCTG 


TATCCTGCTT 


TTGCTGCCAG 


CGCCACCACG 


TTGTTGCCGG 


AATCAGGGTT 


1560 


CTGCTCATAG 


TCATAAATCA 


GTGTCCSGCT 


CAGGGAAGGT 


ACGGTACTGG 


CTGCTGCCGA 


1620 


TGTATAGCCG 


TCAATAAATA 


AACCGGGAGC 


TGTCATTCCA 


GCCACGGCGT 


GGTTGGCCAC 


1680 


GGGATAACCA 


TATACCGACA 


TATAATCCCT 


GCGCACACTC 


TCACCAGTGA 


CAATCACAAT 


1740 


CGTGTCATAT 


AACGGTGTTC 


CCCGGCCAGG 


ATTTTCCCAG 


TTGTCAGCCC 


CGTGCTGACT 


1800 
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CAGTTGTTTA TAATGC7GCA T T T C A C G C AA TGTGTCAGTT GTCCCCAGAA CAGTTCCTTT 18 60 

AACCATCCGC AACGGCCAGC TGTTTACTGA GCATAATACG AACAGCAGCA GTGCCAGCCA 192 0 

GTTACGGTGA CCACGGCGGT GTGTTCGCCA GAAAATCAGC ATGAATACCT GAATCGCGGC 198 0 

ACTGACCAGA AAATGATAAA CAGGAATCAT CCCGGTAAAC TCCGCTGCCT CATCAGTTGT 204 0 

GGTCTGCAGC AACGCGACAA TAAAACTGTT GTTGATTTTA CCGTACGTCA TACCGGCAGG 2100 

CGCATACAGT GCACAACAGA ACAGAAATAA CAGCGCTGTA ATGGATGTGA GGGTATTTCT 2160 

GTGTGCAAGG AGCAGAAGGA GAAACAGAAG CAGCACATTT CCTGTTGCAT TCCTCTCAGT 222 0 

GTATCCGCAT GCAATTGTGG TTATTGCAGA CACAACAAAA AAGAATAAAA ACAATAAAAT 22 8 0 

CGGGGGGGGG TTGCCCGGAC AAAACAGTTT TCTGATATTC ATCGGAGTAT ATCGACAACA 234 0 

TTATTATGAA GAGAACAGGA TAATAAAAAT CAGAAATTAT TGTAAAACAG ATAAAAGCAN 24 00 

CNATGCAGTA ATAGACT 2417 



(2) INFORMATION FOR SEQ I D NO : 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
{ D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

AGACAAAAAC CAGTTACGGT TATCACGTAC CAGCCCCCGT ATTTCCAATT TATAATCCTG 60 

GCCATCAATT ACTGGGATCT CTTCTTCTCC ATAGAAGGCA TTAAAAGGGA ATGGAGTGGT 120 

AATGTCCTCT GGAAGATATT CTGGTGCCAC ACTGTTTTTG CTGAACAGAA AACTTTGAAT 180 

CCGGTCATTA AATCTGGATA TACGGAACAA TGCTTTTTCA ATATCATCAT TATTGCTTAT 24 0 

ATCACAGCCA GTCAGCATCA TAATTCCCCC AAGCGTCAGT CCCTGTTGGA GTAAACGACG 300 

TCTGTCCGGC GCAAGGATTT TTTCTGCATC TTTCACCACG TAATGGGCAT CACTGTCAGA 3 60 

CAAAAAACGT TTTTTCTTCA TTAGTGACCC CGTATCATAG ATAACAATGC ACGCGGAACC 4 20 

AATAACACCA TAACCAGGTG AATAATAATG AACAGTACCA TAATGTTCAT GCACAGAAAG 4 80 

TGGATATAAC GCGCTGTATC ATAACCACCG RATAGTATAG TCAGAAGGGA AAACTGAACG 54 0 

GGTTTCCATA AAACCAGACC AGACAATAGA AGAGCAGCGC CATCTAAAAT AATCAGAATA 600 

TAGGCGACTT TTTGCACCAT ATTGTATTCC TGCATATTCG TATGATGCAG CTTTCCATAC 660 

AGTGCCTGCG TAAGGGATTT TTTCAGTGAG GTCCATGACA GCGGGAAAAA CTTGCTCCGG 720 

AAACGTCCGC TACAAATTCC CAGAGTAAGA TAGATCGTGG CATTAATCAG CAGAATCCAC 780 

ATCAGGGCGA AGTGCCACAG TAACGCACCG CCAAGCCAGC CACCGAGAGT TAATGCTGCC 84 0 - 

GGATAGTTAA AAGAAAACAA AGGAGAAGCA TTATAAATGC GCCATCCACT ACATATCATG 90 0 
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CCTGCGACAG TAACAGCATT AATCCAGTGG CAACAGCGTA ACCACAGAGG RTGTATTTGT 960 

TTTAACGGTA ATGGCTGCAT TATGTGATCT CTGTCTGTAA ACTAAGTATA TTATGGAAAG 102 0 

GAATGTTCAT CACATCCTCA CAAGAGTTTA AAAAAAATGT GACAANTCAT CGTCAAATGC 1080 

TGGGGTAAAA TTCAGATAAA GAATATGTGG ATAACTTTTG ATGAATAACG TAAAAAAAAT 1140 

ACTGCTGATG GAAGATGATT ATGATATTGC AGCTCTGTTG CGGCTTAATC TGCAGGATGA 1200 

AGGGTATCAG ATAGTTCATG AAGCGGATGG CGCCAGAGCT CGTTTATTAC TAGACAAGCA 12 60 

GACCTGGGAT GCCGTAATAC TTGATCTTAT GCTGCCTAAT GTTAATGGGC TGGAGATTTG 1320 

CCGTTATATC CGTCAGATGA CCCGTTATCT GCCTGTGATT ATCATCAGTG CCCGTACCAG 1380 

CGAAACCCAC CGCGTCCTGG GACTGGAAAT GGGGGCTGAT GACTATCTAC CGAAACCCTT 14 40 

TTCCATTCCT GAGCTGATTG NCCCGCATCA AAGCGTTGTT TCGTCGTCAG GAAGCCATGG 1500 

GGCAAAATAT TCTCCTGGCA GGTGGACTGA TTTGCTGTCA CGGTCTGTGC ATCAATCCAT 15 60 

TTTCACGTGA AGTTCATTTG CATAATAAAC AGGTTGATCT TACCCCACGC GAGTTTGATC 1620 

TGCTGCTCTG GTTTGCACGT CATCCTGGCG AAGTTTTTTC CCGTCTTTCA CTGCTGGATA 1680 

ATGTCTGGGG GTATCAGCAT GAAGGATATG AGCATACAGT CAACACGCAT ATCAACCGTC 1740 

TTCGTGCCAA AATTGAACAG GATGCAGCAG AGCCAAAGAT GATCCAGACC GTCTGGGGAA 1800 

AAGGGTATAG GTTTTCAGTT GACAATGCAG GAATGCGATA AATGAATTGT AGCCTGACAT 18 60 

TAAGCCAGAG GTTAAGCCTA GTATTTACAG TCGTTTTGCT GTTTTGCGCC GTGGACATGT 1920 

GGCGTTCATA TTTACAGCAG TAATCTGTAT GGCAATGCAA TGGTACAGCG TTTATCTGCA 1980 

GGCTGGCGCA ACAGATTGTC ATCACGGAGT CTCTGCTGGA TAATCGTGGG CAGGTGAATC 2040 

ACCGGACATT AAAGAGTCTG TTTGAGCGTC TGATGACGCT TAATCCCAGT GTGGAGCTGT 2100 

ATATTGTCTC GCCGGAAGGT CGGCTGCTTG TGGAGGCCGC CCCTCCAGGT CATATCAAAC 2160 

GTCGGTATAT CAATATAGCG CCCTTGAAAA AATTTCTCTC CGGTGCTGTC TGGCCCGTAT 2220 

ATGGTGATGA TCCCCGAAGT GTAAATAAGA AAAAAGTTTT CAGTACCGCA CCGCTTTACC 228 0 

TGAGGGATGA TCTGAAAGGA TATCTGTATA TTATTTTACA GGGAGAGGAA CTTAATGCTC 2 34 0 

TTACTGATGC AGCCTGGACA AAGGCACTAT GGAATGCACT GTACTGGTCG CTGTTTCTGG 2 400 
TAGTGATATG TGGTCTGCTG TCGGGTATGC TGGTCTGGTA CTGGGTAACC CGTCCCATAC 2 4 60 

AGCAACTAAC TGAAAATGTC AGCGGGATAG AGCAGGACAG TATTAGTGCC ATTAAACAAC 2 520 

TGGCAATTCA GCGCCCTGCC ACCCCCCCTA GCAACGAGGT CGAGATATTA CACAATGCCT 2 58 0 

TCATTGAACT GGCCCGTAAA ATATCCTGTC AGTGGGATCA ACTTTCAGAA AGTGATCAAC 2 64 0 

AGCGCCGTGA ATTTATTGCC AATATCTCCC ATGATTTACG GACGCCATTA ACATCACTTC 2700 
TGGGATATCT GGAAACCCTG TCAATGAAGT CGGATTCGCT ATCATCAGAG GACTGTCATA 2760 
AATATCTGAC AACAGCTCTC CGGCAGGGAC ACAAGGTGAG GCATCTGTCC TGTCAGCTTT 2820 
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TTGAGCTGGC ACGTCTTGAG CATGGTGCTA TAAAACCTCA ACTGGAGCAA TTTTGTGTGT 2880 

GTGAACTTAT TGAGGATGTA GCTCAAAAAT TTGAGCTCAG CATAGAAACC CGTCGATTGC 2 94 0 

AACTAAGAAT TATGATGTCA CATTCCCTGC CTCTTATCAG GGCAGATATT TCAATGATAG 3000 

AGCGTGTGAT AACAAATTTA CTGGATAATG CTGTACGCCA CACACCTCCG GAAGGCTCGA 3060 

TCAGGCTGAA AGTCTGGCAG GAAGATAATC GGTTGCACGT CGAAGTGGGT GACAGCGGCC 3120 

CTGGACTAAC TGAAGATATG CGAACTCATC TTTTCCGGCG GGCATCAGTG TTATGTCATG 318 0 

AACCGTCAGA AGAGCCCCGG GGAGGACTGG GATTGCTGAT TGTACGCAGG ATGCTGGTAC 32 4 0 

TACACGGTGG TGATATCAGG TTGACTGATT CAACGACTGG AGCCTGCTTT CGTTTTTTTC 3300 

TTCCATTATA ACATCAGGCG GCATATTTTG GGGTGGTTAT GTGTATCTGC CTTTGTAAAA 3360 

GGGATACAAG TTCTGTAGTG GAGCAGAAAA TCAGGACACC GGAATAACCT GTTTCCACTT 34 2 0 

TTCTTCATGT AAGCAAGGCG GTAAACCATC GTTGTTCGTG TGAGGTCGAT AAACGTTGTA 34 80 

ATAACCATTA ATCCACTGGT TTATATCACG TACCGCATGG ATAAAATCAC CATAACCACC 35 4 0 

TTTCGGAAGC CATTCATTTT TAAGGCTGCG AAAGACTCTT TCCATCGGCG AATTATCCAG 3 600 

GCCATTCCCT CTGCAACTCA TACTTTGCAT TACCCCATAA CGCCAGAGTA ACTTTCTGTA 3660 

TTTATTGCTT TTATACTGAA CACCTTGATC TGAATGAAAG AGCAGGCGGC CATCACGCGG 37 2 0 

TCGAGTTTCC AGTCCGTTAC GCAAAGCCCT ACACACCAAC TCAGCATCAG CGGTTAATGA 3780 

GAGGGCTGAA CCGATAATCC GCCGTGAATA TAAATCAACA ACGAGCGCGA GCTAACACCA 38 4 0 

TTTGTCCTGC AGGCGAATAA AACTGATGTC GCGCACCAGA CGCAGTTTGG TGCGGCGGGG 3900 

TGAAATTGCC GGTTCAGTAA ATTTGGCAAT GGCGGACTTT TGTCTTCGTT TACCCGGTTG 3 960 

TGATGTTTAA CCGGCTGTCG ACTTGTCAGC CCTCATTCCC GCATCAGTCG TCATGCCAGC 4 020 

CACCGGCCTG CATCAACGCC ACTCTGGCGC AACATCTGAG TGATTGCCCG GCTACCCGGC 4 080 

TGCGCCACGA CTGAGAGCAT GGAAAGCCCT CACCCGGCTT CGTAATTCAA TTCTTTGCAC 4140 

ATTAACAGGA CGCTTCACCT GCGCGTAATA AACGCTACGG TTAATACCGA ATAAATGACA 4 200 

AATAACCCAC ACTGGCCACT TTGCTTTCAG CTGTGTGATT AGCGCGACAG CTTCCCGGGG 4 2 60 

ATTTCGCTCA TCAGCACGGC AGCCTGCTTT AGTATTTCTT TTTCCATCTC AACGCGCTTT 4320 

ATCTGCGCTT TAAGCTGCTG AATTTCGCGT TGTTCAGGGG TAATAGCATT ACCAGCTGGC 4 380 

TCAATACCCT GAAGTTCCTG CTTATACAAC CGTATCCATT TACGCAAATG GTCAGGGTTG 44 40 

AGCTCGAGTG CCTGCGCGAC TTCTCTGACA TCACGCTGGT ATTTAACCAC CACCTGCTCG 4 500 

AAAGCTTCAA GCTTGAACTC CGGGGAAAAG GTACGTTTAG TCCGACGAGT TTTGATCATG 4 5 60 

CATCACCTCA TTTTCACTGT TTTAACATTA ACAGGATTTC GAGGTGTCCT GAATTACCGA 4 620 

TCCACTACAA AGTACGACAG GTACTGTGGA GGTACTCCCG TAAAGACGGC CATCAAGCTC 4 68 0 

CCGCTCCGAC ATACCTGCGG GCAGAGGCCA TGAAAAGCCA GCTTTGCGAA AGCGCACGAA 47 40 



NSDOCID <WO_ 9822575A3_IA> 



PCT/US97/21347 

-93- 



CATACCACAA 


GCTGTTGATT 


TTGGTACGCC 


CAGGCGACGC 


CCGACCACAA 


CCTGGGGTAA 


4800 


ATGTTCTTCA 


AAGTGAAGAC 


GTAAAGCTTC 


AGTGATCCAA 


GTCCGGTGTT 


TCATACGATA 


4860 


GTGTCCATTA 


AAAATGATGG 


ACATTATTTT 


TGTAAAACCG 


GAGGAAACAG 


ACCAGACGGT 


4 920 


TTAAATGAGC 


CGGTTACATG 


TAATCCATAC 


TCATCCAAGG 


TTTAATTCTG 


ACACAATAAG 


4980 


AAAATATGGA 


AAGTCTCGCT 


CTAGAGATGG 


GGAGAGGGAT 


ATTGAAGTGT 


ATGATATTCC 


5040 


AAGAACTGCC 


GGAGATATCC 


TCGTAAATGG 


ATTTTCCAGT 


GCAAACTGAT 


AACAAATTCG 


5100 


AAGTCATTAT 


CTGCAACAAG 


ATTGATTGAT 


GTAGGGGATA 


TGTTAGAGCA 


TTATAATGCT 


5160 


CAAGGATTTG 


GCGTGATGAC 


ATCTGCGCCA 


ATTGATGCGA 


CACTATATGA 


TAAACTGGAT 


5220 


GCTATTTGCA 


GTAAGTGTAA 


AATAGAACAA 


ATAAATTTTT 


CAGTATTAGA 


GTCAGAACGC 


5280 


GCACTATATT 


ATGACGATAT 


ATTAAGATGC 


CGTTACTTTG 


GTAAATAMCA 


TAAAATTAAT 


5340 


CAATATGGTA 


ATATATCAGT 


TGTAATTGAT 


CGAAACAAAG 


CACATAAATG 


CCATCTTATA 


5400 


AAGATGGTGT 


TTKTTAAGCA 


TATAAAATAT 


ATTTTCTATA 


AGATATAGGG 


CAAACTAAAT 


54 60 


TTCTTGACTT 


CTATGATGGA 


CTAACTAGAT 


ATACATGCCG 


CCAGTTTTTA 


TAAAACGACG 


5520 


GCATATATAA 


TCATTTATAT 


ATCTTTTGAT 


TTTATTCGTA 


ACCACTCATG 


TTGATCTAAA 


5580 


CCTATTCTTG 


AC AG AT TAG C 


AACAATATCA 


GTTGTTATTT 


TTTGCGCGTA 


CGTTGTTTTT 


5640 


ATTTCCCCGA 


TCCATTTCAA 


TACTTTTGGA 


GTAGATATTT 


TTTCAACGAG 


TAAAGGAACG 


5700 


AATGAGATAT 


AGTCAGTATT 


AACTAGATTG 


TTCTTTTTCC 


CTATGATGAC 


ACCGTTTCCA 


5760 


TTTTCGACTC 


CAAATGAAAA 


TGAAATAATA 


TTAGAAGCTT 


TTGCCGGCAT 


TTTAATTTTA 


5820 


TAAAAACCGC 


CATATTCATC 


TTCGATTAAC 


AAATTGTAAT 


TATTATCGTC 


CAGTGTTCCC 


5880 


CTGAGGAATA 


AAAAATCGGC 


TTTTTCATGC 


AATCTGACGC 


TATCACATAA 


TGGTTGTATG 


5940 


PAT AG AT AG A 


CAAAATTATA 


TGCATCTAAA 


AGTAAAGTTC 


CTTGTTTTAA 


GGACACATTA 


6000 


TCTATATGAG 


AATGATATCT 


TAAACTCCTG 


CGCGTGATTT 


CCAGAGAGCA 


TAATTGCATT 


6060 


AACTTTTTAT 


CTTCTTCACC 


ATCTTGGCTT 


AAGTATTCCT 


TTTTACCTAA 


AGATGCGTGT 


6120 


TCAATAGCGT 


GTTGAATTTC 


TTCTAAAGAA 


TCAGCAGAGA 


GTATATTCCT 


TAGATGTTCT 


6180 


ACTGATAAGT 


CTTTTTGTTT 


TTTTCCAGTT 


AATAGAAAAT 


TCTTACAACC 


ATTTTTTGCA 


6240 


TAGTGAAAAA 


TAGGCCAATG 


GGATAAGGAG 


TTTTTGCTTA 


GAGATTTCTG 


GGGA 


6294 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4519 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
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TATTCCTTTC TCTCCCATGA T A G G G C G AAA GGCTTTATTA CTATCCACTG CTGGTTTATT 60 

AATTGCATCA TCGTCGATTA ATTTGCTGGA GGTTCCAATA GTCAACCACC TCTCTTCAAA 120 

TTCATCGGTT GTCATACCTA ATCCATCATC TCTCAAGATA AGAAGATTTT CTTTCCTAAA 180 

AAAATCAACT TCGACATTAT CAGCATAGGC ATCATGAGCA TTTTTAAATA ACTCACTCAA 240 

GGCAGTAGGT ATACCTGCAA TTTGTTGTCT GCCAAGCATG TCCAAAGCTC GAGCCTTTGT 300 

TCTTATTTTA GCCATATATC TATGAATCCT TATTAGTACA ATTTTCTATG AGATGTAGCC 3 60 

CAAATAGTCT AGCGAGTTCG CAAGGTACAG CATTGCCGAT TTGCTTTGCC ATTGAATTCA 4 20 

GCGAACCTTT AAAAACATAG CTTAAAGGAA ATGTTTGTAA TCTTGATGCT TCTCTTATGC 4 80 

TAATTGCTCT ATGTTGAGTG GGGTCAGGAT GCCCAAAACG ACCATTGGAG TAACTATTAC 54 0 

ATTTCGTCGT AAGTGTAGGC GCAGGCTTAT CCCAACTCAT TCTTCCATAA GTATCTGTGT 600 

GGCCATCATA ATTTTTATGG CATTTATTAA CTAACTCTTC TGGCCAATTT CTTCTATCCC 6 60 

CTCCTTCTGG AGTGTGCATA AKTCTTTTTA GGTTAAGAGG GCTCAGTGTT CCAGCCCTAT 720 

GTAAAGGATC TTTGGGGTCG GTTTCTCCTG AACATAACTT TGTGAAGTCC TGGATATAAT 7 80 

CTCGTACAGT TTTGAATGGG ATTTTATTTT TACCATGGGT TATCTCTGGT AGGGTAACTT 840 

TACCTACTCG ACTAGCTAAG AGCACGAGTC TTTTTCTTCT TTGGGGAATC CCATAGTTCT 900 

CAGCATTGGC TATAAAAGAT ATATAGTTAT ACTCTAACTC TTTAAGTAGC TTAATAAACT 960 

CCTGAAATGG GCCTTCTTTT TCTTCATCAA TTTTTTGCAT TCCAGGAACA TTTTCAAGCA 1020 

TAATATATTC AGGAAGAAGT TCTCTAATAA AACGATGAGT TTCATTTAGT AGATTTCTCC 1080 

TTGAGTCGTC ACTAGTTTTA TTTTTATTCT GTTGCGAAAA TGGTTGACAT GGTGCACATG 1140 

CACTCAGTAA CAAAGGCCGT TTAGCTTTAA TATCAATGAT GTCGGAGATA TCTTGAGGTT 1200 

CGATTTTCCT AATATCATCT TGGATGAATT TTGCATCAGG GAAATTAGCT TTAAATGTTT 12 60 

CTGATGCTTG TTGGTCAATA TCTAATCCAA GCTCGATATC AAAGCCAGCC TGACGTAGCC 1320 

CTTCACTGGC TCCACCACAG CCACAAAAAA AATCTATAAC TATCAATTTG ATACCTTCTT 138 0 

TGAACTAAAT AAAACAACTC GAATAAGTTG ATATTTTAAA TAAAAATAAT TGGTATGGAT 14 40 

ATGAACTTTG GTCACGCTAC CGCCCTGAGK TCATGGCCAT CCCCAGACCT TTTAAAGGGA 1500 

TTATGAACAA CACCCAGCCG ACGTTCAACG GTGTTACCCA TACATATCAC AAAGTTAGTT 15 60 

AATTGGTTGG TCGTAAATTG ACCTAAAATG GATTGAGGGC AATGCAAAAA TCATTGGGAA 1620 

ATCCAGGCGA CACAGATGTT CGGAAGAGAC TGAATGTTAA AAATATAGAA TGTATATTCT 1680 

CAAAAAAGAG ATATTTCATT ACATTTTATA TGTGTATAGG AAAGTGAGAT TGGCGAATCA 17 4 0 

CCTCCCAATC ATCCCGCCAG CGCTCCATTC AGCGCCACGC CAACCCTCAC TCCAGCCCAC 18 00 

GTCATCGCCC CCAGCCAGAA TGTCGGCAAC ACCAGAAACA TCAACCTCAT CACCAGATTG i860- 

ATAATCACGT CATCCTGCGT ATTCTGGATC CCGGCTAAAT TCCAGCTACT GTGGGTATCG 1920 
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CTGTTGTAGA GCACATCCAG CAGCCAGCTA TCAAGCCACC GTGCCAGTTC CCACCAAAAG 1980 

GTGAGGAAAA ATAGTGCAAA CTGCACAAAC GTCAGCGTCA TCACTACTTT CACATCCCAC 204 0 

GCCGAACAGA GCGTTATCAG CGGAATACAG ATCACCAGCG CTATTTGCAG TGCGCCTGTA 2100 

CCATCGGTAG TGCCTAACGC ACGCTGTCGA ATGCCGTACA TGCCGCTATG CTGCCGAGGA 2160 

TATTTCTAGC GCCGGATGCC AACCGGGTGG CGGCATTGGC GACGGTGCCA TCAACGTTAC 2220 

CGCCATAGCT TGGATAAACG CGCCCATTCT GCGATACCTG CATATTTCGT TCACTGACCC 2280 

GCGAGCGCAG CACGGCCTCT TCATACACTA CCTGCGACTG GTCGATTTTT TTAAACGCCG 234 0 

TCCAGATATC TAGGGCAGGA AGTTGCAGTA GACGGGCTTT CAGCCCAAGC GGTGTCGTCG 2 4 00 

GCCGACCGCT GTTTAGAAGT GGGATAGCCG CCCGCGCCCG TATCGGCCAG CCCGGCATCG 2 4 60 

CGCGATGGAC TGTACGGCCA AGCACTGTGT GGTGAAAGCG CATGGTCGGA AAAGGCCTGT 2520 

TCAGCTAACC AAGCACATCC CACCATCACA AGAATCGCCA GAAAACCAAA CTCAGTCAGA 2 58 0 

ATAACTCTTC CTGATTCAGG CTTTGCTCCT GCATTATGGC TACCACTATT GTTTGCCTGC 2 64 0 

ACGTATCATC TGATAACGGT TAATTAACTG ATTTAGCGCC ATTTCAGCCT GTTTTTGCTG 27 00 

CTGTTCACTG CCATTCTGGT TACGGACTTC ACCGTAGCGA CGTAACTGCT CTTCCGCCGG 2760 

GATATGCCGG TTAAAAGCCT GCATGATGCC AAACACCTCC GTTTTCAGTT CACTGACCGT 2 82 0 

CATGTATTTT CCCCGCTGTT CATCCTGACG GTTCAGGCGC TCAGCCAACT GCTGTAAGCG 2 88 0 

GATCATGCCT TCGTTCCAGC CCGTCATCGC CTCTTCCGGG AGCGCACGAC TCCTTACACT 2 94 0 

CTTCTGCCAG TTATCCACCA TTTCCTGAAC ACGGGGATTG CCGGGGACAA GAACCCTCAG 3000 

TTGCTGCAGC AGCTGCGCAC TGCACCGCAG GTTGTATGCT GGAGGTAATT CTGCCAGTCG 3060 

CGTTATCTGC TGACCGGAAA GGGTTATGCA GTGCACTCAG GGCAGATACC GGATTCAGGT 3120 

TAATTTTTTC AAACAGGGAA GCATATACGC TGTCGCCGGT ATGCGTTTCA GATACCACAC 3180 

TCTCTGCGAC GTTCTTTTCT TTCTGTACAG ACATCAGCAT TTTCTGTAAG CGTACAGCGA 32 4 0 

GGGCCGTATT GACGGGGATG TGTTATTCAG CTGGCAGTGC TATGCGCCAC GGAAGCAGTT 3300 

CGCTGACCCG GTTGACCGGC CAGTCTGCTA TGACGGCAAG CACATGGCGA AGGTAGCTTT 33 60 

CTGGATCCAC GTCATTCAGT TTGCACGTCC CGATCAGGCT GTACAGTAGC GCTCCCCGCT 34 20 

CACCACCATG GTCAGAGCCG AAGAACAGGA AGTTTTTACG ACCCAGACTG ACCGCCCGCA 34 8 0 
GGNCATNTTT CAGCGATGTT GTTGTCGATT TCCACCCAGC CATCGTTCGC ATAGTACGTC 35 4 0 

ATGCCGGCCA CTGGTTAAGT GCGTACGCGA ACGCCTTCGC CACCATCAGG CTGGACAGGG 3600 
GACTTTCACC CCCAAGCTGC TGAACATGCC CGGCACACAA AGAAGATCTC GGCTGAGTGG 3 6 60 

CCGGGATTAG TTATACAATT ATCTGATTGA TTTTTAATAT ATCTTTTCTT AAATCATCGT 37 2 0 

TAATATCTGA CGGTTCTAGC TGGTTTATAA GTTGCCTTAT TTGGGTAAAG GTACTTTTCT 37 80 

GATCTTTTAG ATCTTCTCCT TTTATCGTTG ATAAAGCTGC AATTAGTTCA CCATCGTAAT 38 4 0 
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ATTCACCCGC TAACGGCTCT TTAG7TAGAA CTTCCAACAC TCTTGGCATC AACTGATCAA 3 900 

TACATAAATT TTGTCGGATA GCGCGGCAAA GATCTTCCAC TGTTAACTTT TCAAGAGGCA 3 960 

CATCTATGAT ACGTTCGAAC CAGAGTTCAA GCGGTGATTG TTGCTCAGGC TCTTTTGTCA 4 020 

TATTGATGTT TCCAATCAAT TTACGTAAGG TAATCATATT CCATATGGTT TCAAGGCTGA 4 080 

TTCTATTTTA TTAATAGCAT CTGTTGCTCT GCCATACGCA GCGTGAGCTT CAGGATTGTT 414 0 

GACGTTTTTC AACGTATCCG CATGATTTCT TAATCGTCTG AGCGTATTTT GCATTTCCTG 4200 

CATATGATCC CAATATCCTC CATTCTCTTT AGGAACTGGC TTACCATCCA TATCCTTGAG 4 2 60 

AGTTCCAATT AATATCATGA ATCTTTTCAG ANCATTTTTT TAATAGTGGT TAATCGANTC 4 320 

TTCTTTAANT CGGCAACTTT TCTTGGCCTT CCTGGAATTA AAGGCTTTAA TCCTAACAAG 4 38 0 

TTTTTTTCTC AATTTTTGGC TGGCTTTAGG GAATCAATTT TTCCCGGATT GGGTGGGTGG 4440 

GTGGTAACCC GGGTTTCCCT TGAAGCCCGG GAAACCCGGC CCCAAGTTCT TACTTTTTTT 4 500 

CCCGCAATCG GGTCAAGAT 4 519 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1213 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATTACAGAAT GTGGAAATTA AGTATGATTC GAAAAAAGAT TCTGATGGCT GCCATCCCCC 60 

TGTTTGTTAT ATCCGGGGCA GACGCTGCTG TTTCGCTGGA CAGAACCCGC GCGGTGTTTG 120 

ACGGGAGTGA GAAGTCAATG ACGCTTGATA TCTCCAATGA TAACAAACAA CTGCCCTATC 180 

TTGCTCAGGC ATGGATAGAA AATGAAAATC AGGAAAAAAT TATTACAGGG CCGGTTATTG 24 0 

CCACCCCTCC GGTTCAGCGC CTTGAGCCGG GTGCGAAAAG CATGGTCAGG CTGAGTACCA 300 

CACCGGATAT CAGTAAACTT CCTCAGGACA GGGAATCACT GTTTTATTTT AATCTCAGGG 360 

AAATACCGCC GAGGAGTGAA AAGGCCAATG TACTGCAGAT AGCCTTACAG ACCAAAATAA 4 20 

AGCTTTTTTA TCGCCCGGCA GCAATTAAAA CCAGACCAAA TGAAGTATGG CAGGACCAGT 4 80 

TAATTCTGAA CAAAGTCAGC GGTGGGTATC GTATTGAAAA CCCAACGCCC TATTATGTCA 54 0 

CTGTTATTGG TCTGGGAGGA AGTGAAAAGC AGGCAGAGGA AGGTGAGTTT GAAACCGTGA 600 

TGCTGTCTCC CCGTTCAGAG CAGACAGTAA AATCGGCAAA TTATAATACC CCTTATCTGT 6 60 

CTTATATTAA TGACTATGGT GGTCGCCCGG TACTGTCGTT TATCTGTAAT GGTAGCCGTT 720 

GCTCTGTGAA AAAAGAGAAA TAATGTACCG CAATAACGGT TAAATGCGGG TGGGATATTA 7 80 

TGGTTGTGAA TAAAACAACA GCAGTACTGT ATCTTATTGC ACTGTCGCTG AGTGGTTTCA 84 0 
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TCCATACTTT CCTGCGGGCT GAAGAGCGGG GTATATACGA TGACGTCTTT ACTGCAGATG 900 

AGTTGCGTCA TTACCGGATA AATGAACGGG GGGGACGCAC CGGAAGCCTG ACGGTCAGTG 960 

GTGCACTGCT GTCCTCACCC TGCACGGTGG TGAGTAATGA GGTGCCGTTA ARCCTCCGGC 1020 

CGGAAAATCA CTCTGCGGCA GCCGGAGCAC CTCTGATGCT GAGGCTGGCA GGATGTGGGG 1080 

ACGGTGGTGC ACTTCAGCCC GGAAAACGGG GCGTTGCGAT GACAGTCTCC GGCTCACTGG 114 0 

TAACCGGTCC CGGAAGCGGA AGTGCTTTAC TTCCTGACCG TAASCTATCC GGCTGTGACA 1200 

TCTTGTTATA CAC 1213 
(2) INFORMATION FOR SEQ I D NO : 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 451 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
ACGCTCTAGT ATTCTCTGTC GTTCTGCCTG GGCCACTGCA GATAGAATAG TGACAACCAT 60 

TTTACCCATC TCCCCATCGG TACTGATTCC GTCATCAATA AACCGAATGG ATACACCTTG 120 

GGCGTCAAAC TCTTTTATTA ACTGGATCAT GTCAGCAGTA TCGCGCCCAA GGGGTTCAAG 180 

TTTCTTCACC AAGATGACGT CACCTTCCTC CACCTTCATC CTCAGCAAGT CCAGCCCTTT 24 0 

CCGATCGCTT GAACTGCCCG ATGCCTTGTC AGTAAAGATG CGATTTGCTT TCACGCCTGC 300 

GTCTTTGAGT GCCCGAACCT GAATATCGAG AGATTGCTGG CTGGTTGATA CCCGTGCGTA 3 60 

ACCAAAAAGT CGCATAAAAA TGTATCCYAA ATCAAATATC GGACAAGCAG TGTCTGTTAT 420 

AACAAAAAAT CGATTTNAAT TAGACACCNT T 4 51 
(2) INFORMATION FOR SEQ ID NO: 9: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GACAAGGCTT ATAAACTCAC TGACGGGGCT GGCATGTTCC TGCTGGTACA TCCTAATGGT 60 

TCCCGTTACT GGCGTCTCCG TTATCGTATT CTGGGTAAGG AGAAGACTCT GGCACTTGGT 120 

GTGTATCCAG AAGTTTCTCT CTCCGAAGCT CGTACAAAAC GGGATGAGGC CCGAAAACTG 180 

ATTTCGGAGG GGATTGACCC TTGCGAACAG AAAAGAGCTA AAAAAGTAGT CCCTGATTTA 24 0 
CAGCTCTCTT TTGAACATAT TGCACGACGC TGGCATGCCA GTAATAAACA ATGGGCACAA 30 0 
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TCACACAGCG ATAAAGTACT C AAAAG C C T C G AAA C A C A C G TTTTCCCCTT TATC3GCAAC 360 

CGGGATATCA CAACACTCAA TACCCCG GAT CTGCTTATCC CTGTTCGTGC TGCAGAAGCT 4 20 

AAACAAATTT ATGAAATCGC CAGTCGTCTG CAGCAAAGAA TATGTGCCGT AATGCGTTAT 4 80 

GCCGTACAGT CTGGCATCAT CAGATATAAT CCTGCTCTGG ATATGGCTGG CGCATTGACT 54 0 

ACGGTAAAAC GCCAGCATCG CCCCGCTGTT GATCTTTCAC GTCTGCCTGA ACTTCTGTCG 600 

CGTATTAACA GTTATAAAGG NCAGCCTGTC ACCCGGCTTG CGTTGATGCT GAATTTACTG 660 

GGTTTTTATT CGTTCCAGTG AACTCAGATA CGCCCGCTGG TTGTGAAAAT TGATATTGGA 720 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2920 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
NCNTTAATTT TATATCTCGT AAAATAAAAT GTTTTCTGTA CCGCTCTCCG GAGGGGGGAA 60 

TGATTCGTTT ATCATTATTT ATATCGTTGC TTCTGACATC GGTCGCTGTA CTGGCTGATG 120 

TGCAGATTAA CATCAGGGGA AATGTTTATA TCCCCCCATG CACCATTAAT AACGGGCAGA 180 

ATATTGTTGT TGATTTTGGG AATATTAATC CTGAGCATGT GGACAACTCA CGTGGTGAAG 24 0 

TCACAAAAAC CATAAGCATA TCCTGTCCGT ATAAGAGTGG CTCTCTCTGG ATAAAAGTTA 3 00 

CGGGAAATAC TATGGGAGGA GGTCAGAATA ATGTACTGGC AACAAATATA ACTCATTTTG 3 60 

GTATAGCGCT GTATCAGGGA AAAGGAATGT CAACACCTCT TACATTAGGT AATGGTTCAG 4 20 

GAAATGGTTA CAGAGTTACA GCAGGTCTGG ACACAGCACG TTCAACGTTC ACCTTTACTT 4 80 

CAGTGCCCTT TCGTAATGGC AGCGGGATAC TGAATGGCGG GGATTTCCGG ACCACGGCCA 54 0 

GTATGAGCAT GATTTATAAC TGAGTCATAC CCAAATGAAT AACTGTAATT ACGGAAGTGA 600 

TTTCTGATGA AAAAATGGCK CCCTGCTTTT TTATTTTTAT CCCTGTCAGG CTGTAATGAT 6 60 

GCTCTGGCTG CAAACCAGAG TACAATGTTT TACTCGTTTA ATGATAACAT TTATCGTCST 7 20 

CAACTTAGTG TTAAAGTAAC CGATATTGTT CAATTCATAG TGGATATAAA CTCCGCATCA 7 80 

AGTACGGCAA CTTTAAGCTA TGTGGCCTGC AATGGATTTA CCTGGACTCA TGRTCTTTAC 840 

TGGTCTGAGT ATTTTGCATG GCTGGTTGTT CCTAAACATG TTTCCTATAA TGGATATAAT 900 

ATATATCTTG AACTTCAGTC CAGAGGAAGT TTTTCACTTG ATGCAGAAGA TAATGATAAT 9 60 

TACTATCTTA CCAAGGGATT TGCATGGGAT GAAGCAAACA CATCTGGACA GACATGTTTC 102 0 

AATATCGGAG AAAAAAGAAG TCTGGCATGG TCATTTGGTG GTGTTACCCT GAACGCCAGA 1080 - 

TTGCCTGTTG ACCTTCCTAA GGGGGATTAT ACGTTTCCAG TTAAGTTCTT ACGTGGCATT 114 0 
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CAGCGTAATA ATTATGATTA TATTGGTGGA CGCTACAAAA TCCCTTCTTC GTTAATGAAA 12 00 

ACATTTCCTT TTAATGGTAC ATTGAATTTC TCAATTAAAA ATACCGGAGN ATGCCGTCCT 12 60 

TCTGCACAGT CTCTGGAAAT AAATCATGGT GATCTGTCGA TTAATAGCGC TAATAATCAT 1320 

TATGCGGCTC AGACTCTTTC TGTGTCTTGC GATGTGCCTA CAAATATTCG TTTTTTCCTG 1380 

TTAAGCAATA CAAATCCGGC ATACAGCCAT GGTCAGCAAT TTTCGGTTGG TCTGGGTCAT 14 4 0 

GGCTGGGACT CCATTATTTC GATTAATGGC GTGGACACAG GAGAGACAAC GATGAGATGG 1500 

TACAGAGCAG GTACACAAAA CCTGACCATC GCAGTCGCCT CTATGGTGAA TCTTCAAAGA 15 60 

TACAACCAGG AGTACTATCT GGTTCAGCAA CGCTGCTCAT GATATTGCCA TAAATGGTTT 162 0 

ATCCGGAGCC GGATAGTGTG TTGTGGATAT CTGGCATGCC CCGGGAAGTC ACCTTTCAGA 168 0 

CGGGCGGAGG GCTGGTGAAT TATCCGCGAT TACTGAGCAG TATGGATAAT CCTTTTTCAC 17 4 0 

AGACTTGTCA GCAGCCAGCA TTTATGTTCT TTTATCTGAG GGAATTTATC TGTACGCTGT 1800 

GCCGGGATAT CTCAGTTATA CAGAAATCAG GCAGGAATAA ATTGTAGTGG AAAGTCGATG 18 60 

TTTACCGGAT GACTGATGCG CGCTTGTACA CAGACAGTGT GTTTCAGTAA TATGGAGAAT 1920 

AATGAAATGA ATAACACAGA CACATTAGAA AAAATAATCA GACACCAAAA AAACAAAGAC 1980 

CCCGCATATC CTTTCGGGAA CATTTGTTGA TGCAGCTCTG TATTCGCACA AATAAAAGAA 2 04 0 

TGCAGGATAA TATATCTGAA TTTCTGGGGG CGTATGGAAT AAATCACTCA GCATATATGG 2100 

TCCTCACCAC ATTATTCGCA GCGGAGAACC ATTGTCTGTC ACCTTCAGAG ATAAGCCAGA 2160 

AACTTCAGTT TACCAGAACT AATATTACCC GCATTACAGA TTTTTTAGAA AAAGCCGGAT 2 22 0 

ATGTAAAAAG GACGGATAGC AGGGAGGATC GCCGTGCTAA AAAAATC AG T CTGACATCTG 2280 

AAGGTATGTT TTTTATTCAG AGGCTCACTC TTGCACAAAG CATGTATCTG AAAGAAATCT 2 34 0 

GGGATTATCT GACCCATGAT GAACAGGAAC TGTTTGAAGT CATTAATAAA AAATTACTGG 2 4 00 

CACATTTTTC TGATGCCAGC TCATAAAGTG CGAAATATCT GAGGATGCCG GATAGCTTCA 24 60 

GGCAAAATAA TAATGATTCT TGCAGATGTG TTTTTCCGGA T AC AAAAAC A AATGATAAAA 2 520 

ATTGCAGCGC CAGGCACCTT TCAAAGCAGG GAGACCTGTA CCGCGTCGAA AATTTCAGCC 258 0 

AGTTAATATC ATTGTCTGAA CCAGGCACTT TGCCCGGGCA GGAGAAGGAG TTGTGGCGGT 2 64 0 

CTCAGCCCGG AACAATTTGA AAACCATAAT CTCGCTTAGG GCCGTGTCCA CATTACGTGG 2 7 00 

GTAGGATCAC TCCTGGATTT TCTCTTTTTG GACATTGACG TCTCCATTGG TTTAAACACG 27 60 

GCAATGGAGA CTGCGGTGAA AAGAGTTAAT TCCCGGAGTG ACTGGCTGGA TGCCAATCAA 28 20 

TGATCGGAAG CATGCCAAAC TGTGAACGGA GATGGATGCC GCCAAATCAT GATCGATTCA 28 80 

GATGCCATAT TTGCAATATC GCGTTAATCG TCAGTTCAGC 2920 
(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 
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( A ) LENGTH: 1678 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 11: 

GGTAAGGAAG TTATATATAT GAGCAACTAT ACATCTTAGA TGTATGATAA AGAAAAAGAT 60 

AACAGTTCTT TAGAATATGT ATATTGAAGA GAATGCAATA GCATGGTTTA TATAAATTAC 120 

GCATAAAAAT AAGCATATGT AAGCATTTTG GTTTGCTTTT TTTAACCTGC CACCGCAATG 180 

AATGCTTTTT TTATGTTAAT GTGCGTTATG AAACTAAATG CAAGAAACAT ATTTAAAGGA 24 0 

TTAATATCGT TCTCTCACAG ACTCCGTTTA CTTATTCAAG AATATAATTT AATTTATAGT 300 

GAGCTTATTA TGAATATGAA CAATCCATTA GAGGKTCTTG GGCATGTATC CTGGCTCKGG 3 60 

GGCCAGTTCC CCATTACACA GAAACYGGCC AGTTTCTTTG TTTGCAATAA ATGTATTACC 4 20 

TGCAATACGG GGCTAACCAA TATGCTTTAT TAACCCGGGG ATAATTACCC TGTTGCATAT 4 80 

TGTAGTTGGG GCTAATTTAA GTTTAGAAAA TGAAATTAAA TATCCTAATG ATGTTACCTC 54 0 

ATTAGTCGCA GAAGACTGGA CTTCAGGTGA TCGTAAAKGG TYCATTGACT GGATTGCTCC 60 0 

TTTCGGGGAT AACGGTGCCC TGTACAAATA TATGGGAAAA AAATTCCCTG ATGAACTATT 660 

CCGAGCCATC AGGGTGGATY CCAAAACTCA TGTTGGTAAA GTATCAGAAT TTCACGGAGG 7 20 

TAAAATTGAT AAACAGTTAG CGAATAAAAT TTTTAAACAA TATCACCACG AGTTAATAAC 7 80 

TGAAGTAAAA AACAAGACAG ATTTCAATTT TTCATTAACA GGTTAAGAGG TAATTAAATG 84 0 

CCAACAATAA CCACTGCACA AATTAAAAGC ACACTACAGT CTGCAAAGCA ATCCGCTGCA 90 0 

AATAAATTGC ACTCAGCAGG ACAAAGCACG AAAGATGCAT TAAAAAAAGC AGCAGAGCAA 960 

ACCCGCAATG GGGGAAAACA GACTCATTTT TACTTATCCC TAAAGATTAT AAAGGACAGG 1020 

GTTCAAGCCT TAATGACCTT GTCAGGACGG CAGATGAACT GGGAATTGAA GTCCAGTATG 108 0 

ATGAAAAGAA TGGCACGGCG ATTACTAAAC AGGTATTCGG CACAGCAGAG AAACTCATTG 114 0 

GCCTCACCGA ACGGGGAGTG ACTATCTTTG CACCACAATT AGACAAATTA CTGCAAAAGT 1200 

ATCAAAAAGC GGGTAATAAA TTAGGCGGCA GTGCTGAAAA TATAGGTGAT AACTTAGGAA 12 60 

AGGCAGGCAG TGTACTGTCA ACGTTTCAAA ATTTTCTGGG TACTGCACTT TCCTCAATGA 132 0 

AAATAGACGA ACT GAT AAA G AAACAAAAAT CTGGTAGCAA TGTCAGTTCT TCTGAACTGG 138 0 

CAAAAGCGAG TATTGAGCTA ATCAACCAAC TCGTGGACAC AGCTGCCAGC ATTAATAATA 144 0 

ATGTTAACTC ATTTTCTCAA CAACTCAATA AGCTGGGAAG TGTATTATCC AATACAAAGC 1500 

ACCTGAACGG TGTTGGTAAT AAGTTACAGA ATTTACCTAA CCTTGGATAA TATCGGTGCA 1560 

GGGTTAGATA CTGTATCGGG KATTTTATCT GCGRTTTCAG CAAGCTTCAT TCTGAGSCAT 162 0 
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-101- 

GCAGATGCAG ATACCGGRAC TAAAGCTGCC AGCAGGTGTT GGATTNACCA ACGGAANT 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2676 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(y.D SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

AAGGATTACT TTGGAATCTG ACAACAAAGT TACTATGAAA AAGAACTAAC AAAGTTATAT 60 

AATGACGCTA AAAATGCTTT GAAAGATGTG CAATCTAAAG CAAATAGGTT AATTTCTGAT 120 

AATAAGANAA AACATAAGAG TGAACTAAAA AACATTTCTT ATGAATTCCA ATCAACTAAT 180 

CTCAATGGCA AAGATACTGC GTATATATTG GATGTARAAA GAAATCTAGA AAGTAAAATT 240 

GAGAATACTT CAAACGAATG AGTGTAATGA AATAAGAAAA CTAACCGACC AGATTGCAAT 300 

AATTAGTGAT AGTACCACTT CTGAAAATTT ATCATCGGCT CAAGTAACTG AAGCAATCGA 3 60 

AACTGAACTT GAACATTTAC GAGACCAACA AGCAAATAAC GCAGAGTTAA TACTACTTGG 4 20 

CATGGCTCTT TCTGTAGTAC ATCATGNATT TAATGGTAAT ATTAGGGCAA TTAGAAGTGC 4 80 

GCTAAGGGAA TTAAAAGCAT GGGCTGACAG AAATCCTAAG CTTGATATTA TATACCAAAA 54 0 

AATCAGAACT AGTTTTGATC ACTTAGATGG TTATTTAAAA ACCTTTACAC CATTGACAAG 600 

ACGTTTAAGT CGCTCTMAAA CCAATATAAC TGGAACTGCC ATTTTAGAAT TTATCAGAGA 660 

TGTATTCGAT GATCGTCTTG AGAAAGAAGG AATTGAATTA TTCACTACCT CAAAGTTTGT 720 

TAATCAAGAA ATTGTAACTT ACACATCAAC CATTTACCCT GTCTTTATAA ATCTAATTGA 7 80 

TAACGCAATA TACTGGCTTG GGAAAACAAC TGGAGAAAAA AGACT T AT AC TTGATGCKAC 840 

TGAAACAGGA TTTGTTATTG GTGATACTGG TCCCGGTGTT TCAACTAGAG ATCGAGATAT 900 

AATATTTGAT ATGGGATTTA CACGAAAAAC AGGAGGGCGT GGAATGGGAT TATTCATTTC 960 

CAAAGAGTGT TTATCTCGAG ATGGATTTAC TATAAGATTG GATGATTACA CTCCTGAACA 1020 

GGGTGCTTTC TTTATTATTG AGCCATCAGA AGAAACAAGT GAATAGCGGA TATAAATAAA 1080 

TGACAAGCTC TACTGATTTN CATAAACTTT CTGAAGACTG CGTTCGCCGT TTTTTACATT 1140 

CTGTAGTTGC TGTAGATGAC AATATGTCTT TTGGAGCTGG TAGTGATACT TTCCCTACAG 1200 

ACGAAGATAT TAATGCTTTA GTTGATCCCG ACGATGATCC TACACCAATA ATAACAGCAT 1260 

CAGCATCCCC AAGGATAGAA TCAACTAAAT CAAAAGCAAA GGTAAAAAAC CATCCTTTTG 1320 

ATTACCAAGC TCTAGCAGAA GCTTTCGCCA AAGATGGTAT TGCTTGTTGC GGATTATTAG 1380 

CTAAGGAAGG TGCGAATAAG CGGGGAAATT CTTCTCGGCT GACTCAGTCA TTTCATTTCT 14 40 

TCATGTTTGA GCCGATTTTT TCTCCCGTAA ATGCCTTGAA TCAGCCTATT TAGACCGTTT 1500 
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-102- 

CTTCGCCATT TAAGGCGTTA TCCCCAGTTT T T ACT GAGA T CTCTCCCACT GACGTATCAT 15 60 

TTGGTCCGCC CGAAACAGGT TGGC GAGCGT GAATAACATC GCCAGTTGGT TAT GGTTTTT 1620 

CAGCAACCCC TTGTATCTGG CTTTGACGAA GCCGAACTGT CGCTTGATGA TGCGAAATGG 168 0 

GTGCTCCACC CTGGGCCGGA TGCTGGGTTT CATGTATTGG ATGTTGATGG CCGTTTTGTT 17 4 0 

GTTGGGTGGA TGCTGTTTCA AGG77CTTAC CTTGCCGGGG GGGTCGGCGA TCAGCCAGTC 18 00 

GAGATCCACC TGGGGCAGCT CCTCGGGGTG TGGCGCCCCT TGGTAGCCGG CATCGGGTGA 18 60 

GACAAATTGC TCCTCTCCAT GCAGCAGATT ACCCAGCTGA TTGAGGTCAT GCTCGTTGGC 1920 

GGGGGTGGTG AGCAGGCTGT GGGTGAGGGC AGTCTTGGGA TCGACACCAA TGTGGGCCTT 1980 

CATGCCAAAG TGGCACTGAT TGCCTTTGTT GGTCTGATGC ATCTCCGGAT CGCGTTGCTG 204 0 

CTCTTTGTTG TTGGTCGAGC TGGGTGCCTC AATGATGGTG GCATCGACCA AGGTGCCTTG 2100 

AGTCATGATG ACGCCTGCTT CGGCGAGGCA GCGATTGATG GTGTTGAACA ATTGGCGGGC 2160 

GAGTTGATGC TGCTCCAGCA GGTGGGGGAA ATTCATGATG GTGGTGCGGT CCGGCAAGGC 2220 

GCTATCCAGG GATAACCGGG CAAACAGACG CATGGAGGGG ATTTCGTACA GAGCATCTTC 22 8 0 

CATCGCGCCA TCGCTCAGGT TGTACCAATG CTGCATGCAG TGAATGCGTA GCATGGTTTC 23 4 0 

CAGCGGATAA GGTCGCCGGC CATTACCAGC CTTGGGGTAA AACGGCTCGA TGAGTTCCAC 24 00 

CATGTTTTGC CATGGCAGAA TCTGCTCCAT GCGGGACAAG AAAATCTCTT TTCTGGTCTG 24 60 

ACGGCGCTTA CTGCTGAATT CACTGTCGGC GAAGGTAAGT TGATGACTCA TGATGAACCC 2520 

TGTTCTATGG CTCCAGATGA CAAACATGAT CTGATATCAG GGACTTGTTC GCAGCTTCCC 2580 

TAAGAGTTTT AATGTTTGAA GAAAGAGATA TAATTACAGC ATCATCCCAC AAAGGAGATA 2 64 0 

TTACAATACC TTGACTGGGN TATTGCCAAG CGGATA 2 67 6 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1485 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

AAATTTGTCC TCCGGNTCTT TTCCCGTGGA TACGGGCATT GAGACCCGAA AGGSCCTGTA 60 

TTTGCGACCG GAGAGGCATC CTGGGGGCTC AGTAAACCAG TGGTCGCTGT ATGGCGGGGC 120 

TGTGCTTGCC GGTGATTATA ATGNCACTGG SAGCCGGTGC CGGCTGGGAC CTGGGTGTGC 180 

CGGGGACCCT TTCCGCTGAT ATCACGCAGT CAGTAGCCCG TATTGAGGGA GAGAGAACGT 24 0 

TTCAGGGAAA ATCCTGGCGT CTGAGCTACT CCAAACGGTT TGATAATGCG GATGCCGACA 300 

TTACGTTCGC CGGGTATCGT TTCTCAGAGC G AAAC TAT AT GACCATGGAG CAGTACCTGA 3 60 



NSDOCID <WO . 9Q22575A3_IA> 



WO 98/22575 PCT/US9 7/2 1347 

-103- 

ACGCCCGCTA CCGTAATGAT TACAGCACTC GGGAAAAAGA GATGTATACC GTTACGCTGA 4 20 

ATAAAAACGT GGCGGACTGG AACACCTGTT TTAACCTGCA GTACAGCCGT CAGACATACT 480 

GGGACATACG GAAAACGGAC TATTATACGG TGAGCGTCAA CCGCTACTTT AATGTTTTCG 54 0 

GACTGCAGGG TGTGGCGGTT GGATTGTCAG CCTCAAGGTC TAAATATCTG GGGCGTGATA 600 

ACRRTTCTGC TTACCTGCGT ATATCCGTGC CGCTGGGGAC GGGGACAGCG AGCTACAGTG 660 

GCAGTATGAG TAATGACCGT TATGTGAATA TGGCCGGCTA CACTGACACG TTCAATGACG 720 

GTCTGGACAG CTACAGCCTG AACGCCGGCC TTAACAGTGG CGGTGGACTG ACATCGCAAC 780 

GTCAGATTAA TGCCTATTAC AGTCATCGTA GTCCGCTGGC AAATTTGTCC GCGAATATTG 84 0 

CATCCCTGCA GAAAGGATAT ACGTCTTTCG GCGTCAGTGC TTCCGGTGGG GCAACAATTA 900 

CCGGAAAAGG TGCGGCGTTA CATGCAGGGG GAATGTCCGG TGGAACACGT CTTCTTGTTG 9 60 

ACACGGATGG TGTGGGAGGT GTACCGGTTG ATGGCGGGCA GGTGGTGACA AATCGCTGGG 1020 

GAACGGGCGT GGTGACTGAC ATCAGCAGTT ATTACCGGAA TACAACCTCT GTTGACCTGA 1080 

AGCGCTTACC GGATGATGTG GAAGCAACCC GTTCTGTTGT GGAATCGGCG CTGACAGAAG 1140 

GTGCCATTGG TTACCGGAAA TTCAGCGTGC TTAAAGGGAA ACGTCTGTTT GCAATACTGC 1200 

GTCTTGCTGA TGGCTCTCAG CCCCCGTTTG GTGCCAGTGT AACCAGTGAA AAAGGCCGGG 1260 

AACTGGGCAT GGTGGCCGAC GAAGGCGTTG CCTGGCTGAG TGGCGTGACG CCGGGGGAAA 1320 

CCCTGTCGGT AAACTGGGAT GGAAAAATAC AGTGTCAGGT AAATGTACCG GAGACAGCAA 138 0 

TATCTGACCA GCAGTTATTG CTTCCCTGTA CGCCTCAGAA ATAAATGAAA GTCCGGAATA 14 40 

TTAACGGCTG ATTGAATTGC GGTTTATGCC ATTTTCCCGG ACCAA 14 85 
(2) INFORMATION FOR SEQ I D NO : 14: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22671 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



TTACCAATTT 


CATCGTCCGG 


TACATCCTCC 


AGAACATCTC 


GCAATAAACT 


CTCGTCTGCC 


60 


TCATTCCATG 


CCACACCAGC 


ATTTGGGAAA 


CGAGGATCGA 


TCTCTCTTTC 


CTTCTTCTCC 


120 


TTCTTACTTT 


GCTCTTTTCG 


GGATGATACA 


GATACGACAG 


AACGTTCTTT 


TACCGCTGTA 


180 


ATTGCCATAA 


CTGCATTGAG 


CAGAGATCTG 


CGCTCCACAT 


CGTTCAGCAT 


TTTTCCTTCA 


240 


CAGATCAAAT 


CATTCAGGAT 


GTCAATGACT 


AGATTCAGAC 


TTTCTTCTGT 


TAGCTTCATA 


300 


TTTCAGACCT 


TGAAGTATGT 


AGATAATCAG 


CACAATTACT 


AATGTGATAA 


ATATCAGAAG 


360 * 


ATAATTTACA 


GGTAAACCGG 


AAAATACATC 


TGAAGAATAA 


AGGCCTCAGC 


TTAACGTTTC 


420 
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AGCCAGTTTG TGAGCTGATT GAGGTACGGC GATGACATTA AGGGGAATTA CTCGCGTATA 4 80 

GCTGTGAGCT TATTTTTGAC CCTGGCAACA TATGGTGGCT ACTGCGCATG GTTTTGGAGT 54 0 

AGATATCTTA CTACTCGTAG AATTGTGCTT ACTGGTCAGG CGAGCGCACA GGCATTCCGT 600 

GCAATGAATA GAACACTGGT TTTTTAGTCT TCCGTTACGC ATGAGGATGT TAGTGCAGAT 6 60 

TCCGGTGTAT TCGATCAGTT GTTCGGCGAA TCAGCGATCG ATGACGATGC GATTTCGTAT 720 

GTTAGGGATG CTGGTATGAT TACTCGCTGA AAAATAATGT GAAAAGGCAG TTTTTCTTTA 780 

GACATTTAGC TCATTCATGC TGTTGTTTTA CGTTTTGCTG TCGTGTGCAG GATTATGTTT 84 0 

TCGTTACGGG ACGATTCATT CCGTTTTAAT CAGGAGCTAT TGGCGTTGCT CATTGGTGGG 900 

ATGCCGTAAA GTTTTACCGC GGCGATTAAT GATGTGAAGT CAATCCAAAT CAACGGAGAT 960 

CTCTCATCAT GAATCAACCA AT AC AC AAT G ATTACTGGTT ATCCCGTTTT GAAAGTATTC 1020 

TCAACAGTGC CCTGGTGCAA CACCGTGCCG TCTCGTTAAT CTGGGTGGAT TTACGTTTCC 1080 

CTGAGCATAT GCCTGTCACC ATCATGGATC CCGATCCGGA TTCAGCGGTG ATTTCTCGTT 114 0 

TTTTCGAATC CCTGAAAGCC AAAATTCAGG CTTACCAGCG GAAAAAACGA CGTACCAACA 1200 

AGCGTGTGCG TGCAACCACC CTGCATTATT TCTGGTGTCG GGAGTTTGGC AAGGAAAAAG 1260 

GCAGGAAACA TTATCACGTG ATATTACTGC TCAACAAAGA TACCTGGTGC TCGCCAGGGG 1320 

ATTTCACCGT TCCTTCTTCG CTGGCGACGC TGATCCAACT GGCATGGTGT AGCGCTCTGC 1380 

ATCTTGAGCC CTGGCAGGGT AATGGACTGG TTCATTTTTC CAGGCGGACG CYTTTCCGTA 14 4 0 

AACCGGTATC ATCTGATGCT CGCCCTTCTT CCGATGATAC GCCTTTGTCG GGTGGATGTT 1500 

CTGAAACCAG GAAGGCTTCA GACAAAAAGC CGGGTGAAGC CGCTGTTCTC TGGATCAAGC 1560 

GTGGTGATGT GGAAGCGATG CAGAAAGCCA TGGAGAGAGC CCGTTATCTC GTGAAGTATG 162 0 

AGACGAAGCA GCATGACGGT TCTGGTCAAC GTAATTATGG TTGCAGCCGT GGAGCGGGGC 1680 

GTCTACTGGA TGGCAGGTGA ACCCTGTAAA ACGGCATCCG GTGCCAGAGT ATATGTCACA 17 4 0 

GTAAGGGCGT GGTTGATGCC CTTAGCTCGT TTTCTGAAAA AGTCGTCCTG AAGTCATGTG 1800 

TCACGAACGG TGCAATAGTG ATCCACACCC AACGCCTGAA ATCAGATCCA GGGGGTAATC 18 60 

TGCTCTCCTG ATTCAGGAGA GYTTATGGTC ACTTTTGAGA CAGTTATGGA AATTAAAATC 1920 

CTGCACAAGC AGGGAATGAG TAGCCGGGCG ATTGCCAGAG AACTGGGGAT CTCCCGCAAT 198 0 

ACGGTTAAAC GTTATTTGCA GGCAAAATCT GAGCCGCCAA AATATACGCC GCGACCTGCT 204 0 

GTTGCTTCAC TCCTGGATGA ATACCGGGAT TATATTCGTC AACGCATCGC CGATGCTCAT 210 0 

CCTTACAAAA TCCCGGCAAC GGTAATCGCT CGAGAGATCA GAGACCAGGG ATATCGTGGC 2160 

GGAATGACCA TTCTCAGGGC ATTCATTCGT TCTCTCTCGG TTCCTCAGGA GCAGGAGCCT 222 0 

GCCGTTCGGT TCGAAACTGA ACCCGGACGA CAGATGCAGG TTGACTGGGG CACTATGCGT 2280 

AATGGTCGCT CACCGCTTCA CGTGTTCGTT GCTGTTCTCG GATACAGCCG AATGCTGTAC 234 0 
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ATCGAATTCA CTGACAATAT GCGTTATGAC ACGCTGGAGA CCTGCCATCG TAATGCGTTC 2 4 00 

CGCTTCTTTG GTGGTGTGCC GCGCGAAGTG TTGTATGACA ATATGAAAAC TGTGGTTCTG 24 60 

CAACGTGACG CATATCAGAC CGGTCAGCAC CGGTTCCATC CTTCGTTGTG GCAGTTCGGC 2 520 

AAGGAGATGG GCTTCTCTCC CCGACTGTCT CGCCCCTTCA GGGCACAGAC TAAAGGTAAG 2 58 0 

GTGGAACGGA TGGTGCAGTA CACCCGTAAC AGTTTTTACA TCC CACTAAT GACTCGCCTG 2 64 0 

CGACCGATGG GGATCACTGT CGATGTTGAA ACAGCCAGCC GCCACGGTCT GCGCTGGCTG 2700 

CACGATGTCG CTAAGCAACG AAAGCATGAA ACAATCCAGG CCCGTCCCTG CGATCGCTGG 27 60 

CTCGAAGAGC AGCAGTCCAT GCTGGCACTG CCTGCGGAGA AAAAAGAGTA TGACGTGCAT 2 820 

CCTGGTGAAA ATCTGGTGAA CTTCGACAAA CACCCCCTGC ATCATCCACT CTCCATTTAC 2 8 80 

GACTCATTCT GCAGAGGAGT GGCGTGATGA TGGAAGTGCA ACATCAACGA CTGATGGCGC 2 94 0 

TCGCCGGGGA GTTGCAACTG GAAAGCCTTA TAAGCGCAGC GCCTGCGCTG TCACAACAGG 3000 

CAGTAGACCA GGAATGGAGT TATATGGACT TCCTGGAGCA TCTGCTTCAT GAAGAAAAAC 3060 

TGGCACGTCA TCAACGTAAA CAGGCGATGT ATACCCGAAT GGCAGCCTTC CCGGCGGTGA 3120 

AAACGTTCGA AGAGTATGAC TTCACATTCG CCACCGGAGC ACCGCAGAAG CAACTCCAGT 3180 

CGTTACGCTC ACTCAGCTTC ATAGAACGTA ATGAAAATAT CGTATTACTG GGACCATCAG 324 0 

GTGTGGGGAA AACCCATCTG GCAATAGCGA TGGGCTATGA AGCAGTCCGT GCAGGTATCA 3300 

AAGTTCGCTT CACAACAGCA GCAGATCTGT TACTTCAGTT ATCTACGGCA CAACGTCAGG 33 60 

GCCGTTATAA AACGACGCTT CAGCGTGGAG TAATGGCCCC CCGCCTGCTC ATCATTGATG 3420 

AAATAGGCTA TCTGCCGTTC AGTCAGGAAG AAGCAAAACT GITCTTCCAG GTCATTGCTA 34 8 0 

AACGTTACGA AAAGAGCGGA ATGATCCTGA CATCCAATCT GCCGTTCGGG CAGTGGGATC 354 0 

AAACGTTCGC CGGTGATGCA GCCCTGACCT CAGCGATGCT GGACCGTATC TTACACCACT 3 600 

CACATGTCGT TCAAATCAAA GGAGAAAGCT ATCGACTCAG ACAGAAACGA AAGGCCGGGG 3 6 60 
TTATAGCAGA AGCTAATCCT GAGTAAAACG GTGGATCAAT ATTGGGCCGT TGGTGGAGAT 3720 
ATAAGTGGAT CACTTTTCAT CCGTCGTTGA CATCATGCAA TGTTTCCTGG TTTTCATGCA 3780 
TCCATCATTT GTCGCTGCGA TGCCAGACTT CTGGATGCAC ACATGTTGTT TTACTTTTGT 38 4 0 

CAGCATCATA AATGCGCCGG GACTGGTGAA TGGAGATAAG CGATTTTATT ATGGACGTCA 3 900 

GCGAACATAC TCACCATGCC GGTATGTTCG TGAACTGAAC AATAAGTTTT GCGCTGATTA 3 960 

CAGTATGTGA AGGAGGTCCG TTACAATGAA TTCCGCTTAT ATGCAATCCT TGCAGACATC 4020 
CCACCACTTC CCAGCTGATT TAACCTACAG ATTATTTCCT AGTGAGCTTG GATATCTCAT 4 080 

TGACGACTTA TATGAAAGTA CCCAACTTCC GCTGGAGCTC ATTTTTAATA CTGTACTGGC 4140 
AACGCTGTCA CTCTCCTGTC AGTCACTGGT TGACGTTGTT CATCCTCACA CCAAGATGCC 4 2 00 

GGAACCCTGC TCACTTTATC TGTTGGCAAT CGCAGAGCCA GGCGCGGGAA AAACAACGAT 4 2 60 
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AAACACACTG GTGATGAACC CCTG7TACGA A7TTGCCGAT CGACTCATTC AAC AA T ACG A 4 320 

A G A G A G AAA C AAAGATTATA AG AC 7 G AAC T ACAGATCTGG AATACCCGGC AG AAAG C G C T 4 38 0 

TGCTGCCAA? TTAAGAAAGG CTGT7AACCG GGGGTATCCG GGGGAACAGG AAGAAGAGGC 4 4 40 

GCTGCGTAAT CACGAAAGAA AT AAA C C G AC ACGTCCGGTT CGACCGAATT TTATCTATGA 4 500 

AGATGTTTCG CTTAAAGCGC TTGTGGAAGG GCTCAATGAA CATCCTGAGG CAGGGGTTAT 4 560 

TTCTGACGAG GCGGTCACTT TTTTCAGAAG CTATCTGAAA AATTATCCGG GCCTGTTGAA 4 620 

TAAAGCATGG AGTGGACAAC CGTTTGATTT TGGACGGGCT G AC GAG AAA T ACCATATCAC 4 68 0 

GCCACGTCTG ACATTTTCGT TAATGTCCCA GCCGGATGTC TTTACGAATT ATATAAATAA 4 74 0 

AAATGACGTA CTGGCGTGGG GAAGCGGATT TCTTTCCCGG TTTCTGTTCA GTCAGACCGG 4 800 

AAGTCCTTCC CGGGTACGGG ATTATACGAG AGGCGAGTTC AGAACAAAAC CAACCCTGGA 4 8 60 

GAAGTTTCAT AAAAAGATTA ACGGATTTCT GTTAAGCCAT AACATTAATT CCCCCGGTAT 4 920 

GAGCACCGAA AGGAAAACAT TAAAACTTGC AAAGAAAGCG TTGGGGGAGT GGCAGGAAAA 4 98 0 

CCAGATTAAG ATTGAAAGAA AAGCGCTTGC AGGAGGGGAG TGGGAACACA TCAGAGATAT 504 0 

TGTTCTGAAA GCAGGTTCTA ATATACTGAG GATAGCTGGA ATATTCACCT GCTATTGCTA 5100 

TAAAGATGCT GAGGAAATTG AATCAATTGC GCTTTTTAAA GCTATGCATC TCATGGGCTG 5160 

GTATCTGGAG GAGGCGAGCA CAATATTTTA TCCCATGTCT GCACGATGCC AG T T T G AAC A 522 0 

GGATGCCTGT GAACTGTATG CATGGATTAT GACCCGAATA AGGCAGAATA ATTGGCGTGC 5280 

TATCAGGAAA ACAGACATTG AAAGATATGG TCCCAATCGT CTGAGAAGAG C AG AAAAAC T 5 34 0 

TACACCTGTA CTCAATCAGT TAATCGYTCA GAATTATTTC CGTATCATCM AAGATGCGAT 54 00 

CGCATCAGGC ACTTTATGTT TCTGCTCTTG ATAATAATGG TTACATCCTT CCTTTCGGCG 54 60 

CAATGTCTTA CGAACCGTTT GATATTGTTC CACCCCAGTA TAACCATAAT GCGAAAACAT 5 52 0 

ATTCCGTTGT TATTCCACCG GCATTAATTC AGTCATTTAC ACCTGATTCC TCAGCTTACA 5580 

CCTTATTTTA AAACAATTTT GTGAGTAGAA AACGAAAATC ATAATCCTTC GAATGAAGGT 5 64 0 

TAATGATAAG GTGTGTTGCA TATCCTGCAC CTGTGCAAAT ATTCACCAAT CATTGGGTGT 5700 

GAATGAAAAT TTCTCTGAAA AAATCGCTAT GGTAGCAACA GTAGCAGCAC ATACACTACA 57 60 

TCTGTGATTT GGTTTTGTTT TCATAATGAC CTGCTGTCAG AGCTGATTGA ATGCTGGGAT 5820 

GTGCGCACTG GTGGAAGAGT GGTTTTCGTT TCAGATATAA CGAAAGGTAA TCGAAAGATT 5880 

GTTTTAAACA TGGATTAAAG CTAATAATTA ACCATATTGT GTGAGTTTTT ATATATAAGT 5 94 0 

TTGTTTGATT CTTGCCGTGA TGAGTGCTGG GGTATATGAC GATGTCGCTC TCTTTCTGAA 6000 

TAACAAATTA TTATTCGTCT GTTACTGATA AGGGATGCGA TTCATGTTTT AATAGAGGGT 6060 

TGAAGAAAAT TAATTTGATA TTTTTTTGTA AGGGAATGGA ACTGTCCGGA ATATGTTCAG 6120 

AACGGCGGAT TTCTCATTTC CATTCATTAA ACATGGATAA TTTTAATTTA GGTTTATTAC 6180 
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-107- 

TATTATTATA CTCACTCCCT TTTTCATACA ATCTCTATTG TTATTTACTT CCTGTCTTTA 624 0 

CTCACTCTCT ATCTTTACGA TTATATTCAC TCTAT CGTTA CACATTCCAT TAGTATTACT 6300 

CTTGTTATCG TATTCATTCC ATCCCTCAAT CATATTTACT GTAACTCATA TGATGTTCAG 6360 

GTAAGTTATT CTCTACCATT CTACTGATGA TATCCATCTG TTCTCATTTT CAGTGAAACA 6420 

GCAATTGATT TTAATCTTAT CCATCATGAA CTGTATTTGC TTAACAATGA TTGTTTATCT 64 80 

GAAGTGTTTT AACTATTCTG GTTGGAAACA ATTTCTCTGT CATCACAGAT TAACTGAATG 654 0 

TTTACTCTTT GATAAGGTAT CCATGATTCC GTCATGTTTA ACAGCGCAGG ATAAACAACA 6 600 

GAATTAACAG AGTGAATTTC TGATTATATT TGTTGCCGGT TGTATTGTTT AAGGTACTGG 6 6 60 

GTGAAAATTA TTCATCCATG GTATGTTGTC TTATGCTATC GTGTGTCGTT AACGTTCATA 67 20 

TCCTGGAGAA CAGATTGAAT GAGCGCATAT AAGTTTATTG CATTGGCCTT GTACACGGTT 67 8 0 

TTTACAACCA CTGAGAGCAA GTTTGTAGTT TATGATGTGA TTGGTCGCAA TATGTTTCTT 68 4 0 

AACCTTCTGG TGGTGGTGTT TTATCGCGTA TTTTGCAGTA TTTCGTGATG TTTTATTGAG 6900 

TCTGTATTTT CTTTACTCCT CGTTTATCTC ATCTCTTTAG CTAATACCAT CAGATAATCC 6960 

ATTTCTTTCT GCATAATGCT GCGTATCGTT AATAACCCGT CGTATCCATT CTGCTACAGC 7020 

ATGCCTGATA AATACCATCT GTAAGTTATT ACCGTTTTAG ATCTGATTAT GAGCGAAAGC 7 08 0 

ATTAATTCGT TCACAGAGCT TAAAACATCA TTAACTTTCA GGAGTCATCA ACATGCCTAA 714 0 

ATCTTACACA CCAAACTGGT TTTTTACCGC TTTACTTGAC AATCACATCA ATCAAATGAT 7 200 

GGCACGCTAT TCCTGCCTGC GGGCCTTACG CATGGATTTC TTCTACAGGA AAGATACGCC 7 2 60 

CGATTTCTTA CAACCTGATC ATCGCTGGCT TGAATTGCAG TTGCGTATGA TGCTGGAGCA 7 32 0 

GGTGGAACAA TTTGAAAATA TCGTTGGCTT CTTCTGGGTG ATTGAATGGA CGGCTGATCA 7380 

TGGTTTTCAT GCGCATGCGG TTTTCTGGAT CGATCGTCAG AGGGTTAAAA AAATATATCC 74 4 0 

CTTTGCGGAG CGGATTACGG AATGCTGGCG GTCTATTACG CATAACAGCG GTTCGGCACA 7 500 

CCGCTGCACA TATCAGCCGC ATTATACATA CAACATCAAC ATTCCTGTGC GCCACAACGA 7 5 60 

TCCTGAAAGC ATCGATAATA TTCGCGGTGC CCTGCATTAT CTGGCGAAAG AAGAGCAAAA 7 62 0 

AGACGGGCTG TGTGCTTACG GGTGCAATGA AGTTCCTGAA CGTCCTGCTG CAGGGCGTCC 7680 

TCGTAAGCCT CACTTCTGAA GCTTAAGGCC TGAGCCTTCG CTCCTGGAAA GACTCCGTCG 77 4 0 

GTAAAAACTT ACCGCCTTGA TTAATGATGT GAACTGAAGT CAACGGAGAT CATTCATCCT 7 8 00 

GAACCTGCAT CCGGTGTTTT GTTCCTTGTC TTCCCGTTCT GCTTCGGTTC TTCACTTATT 7 8 60 

CCATCAATCT CATTCCGCAA GCCATAACAC GTCAGCTCAT TCACGGGCAG GACGCATTGT 7 920 
GGGCTGCGCA TAACGGAACA TATCTTATGA ATGCTATTCC TTATTTCGAC TATAGCCTGG 7 98 0 

CACCCTTCTG GCCATCTTAT CAGAACAAAG TCATCGGCGT CCTTGAGCGT GCGCTGCGTG 804 0 

AGCAGTCCGG CTCACGGATA CGGCGGATCC TGCTTCGTCT GCCGTGGGAA CATGACAACG 8100 
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CCTTCAGCAG C AG AAA G A T C TGGTTCGGTA TGGACTTTAT CGAAACCGTC AGTGGGCTGA 81 CO 

TGAATGGGAA AGGCGGACGG GACGTTTGCT GGCTGCTGAC CGGTGATCCG GAAAAGCCGG 821*0 

AATACCACGT GGTGCTGTGC GTCAGACAGG AGTATTTCGA CGGCCCCGAA CTGGATCGGT 82 80 

TGATAGTGGA TGCCTGGAGT AATGTGGTGG GTTTGGCGTC AGGAGGTGAA GCAAAGCCGT 8 34 0 

ACCAGAAGCA GATCACCCGG GATGTGGTAC TGGATCGCCG GTGACCGGAC TGCGAAGCCC 8 4 00 

TGTTTAAGGA CCTTATCTGG GCGTTCAGTG ATTTCGCCCG GGATCGCCGT GGAGTGTGCG 8 4 60 

ATCCGGAAGC CCGTTGCCTT GCCGGCAATC CCGGTTGGCA GTGGTGAAAG CAGGACGCCA 8520 

TCCCATCCCC CGTATTACCC CATTCTTCAT AAATCTCACT GAGGACATTC TGACCATGTT 8 580 

GACCAGAACA AGCCACGACA GCGTATTGCT GCGTGCCGAC GATCCCCTGA TCGAGATGAA 8 64 0 

CTACATCACC AGTTTCACCG GCATGACCGA TAAATGGTTT TAGAGGCTGA TCAGTGAAGG 8 7 00 

GCATTTTCCT AAACGCATCA AACTGGGGCG CAGCAGCCGC TGGTACAAAA GTGAAGTGGA 8 7 60 

GCAGTGGATG CAACAACGAA TTGAGGAATC ACGAGGAGCA GCAGCATGAA ACGTGTTGTG 8 8 20 

ATGCCAGTAC GTTGGCAATG TGCAAAATGC CAGCGCTGGT ATTGTGGAAA TCAGCCCTGT 88 80 

CCCTGGTGCT GGCGACATTC CCGCTTATCT TTCCGGTGAC ACGCTCCGGT CAGCCAACTG 8 94 0 

TTAGTCATCA TTTCCTGACT GATTCGTCAT TCCATTCTTA TTGATTATAA CTGGGATTAC 9000 

ACCGGTGCTG GCGTGCTTTC CTGCGTGTCT GCACCGGTTT GACAAAATTC AACAGGGTTT 90 60 

GAAAAGGAAC ATTTCGTGCA AATAACCGAA GCCTTAATTT CAGAGCCGGG AGACATCCGG 9120 

CGTTTTATTC AACATGCTGT TGACCACTGG CCGCGTCTGC TGGCAGTCCA CTTCATACTC 9180 

CATTCGACAG AAGGAAACAT CTACGGGCAA CAGATTCATG CATTCTGCAG TTCCTTTTAT 92 4 0 

CGACAACTGC ATGAACGTAT TACTGAGAGC AATCACACTG CCAGTCCATC ATCGTCGGTG 9300 

GTATTACGCT GGTTGGGGGA ACAAGATGGA GGAGCAACAA TTCGATGCCT GTTGCTGCTC 9360 

AGCCAGACGA GTATTTGTCA CCCGCGAGCC AGTGTCACAG TTGATGAACA ATGTTCGCAA 94 20 

GTGGTGGATT TACTGCAACA TAGCTGGCAG GTGATAAGTG CTGGCGGACA ATGCCGGGTG 94 8 0 

GAAAGGTGTT TTCGGGTTGC CCGGGGTGAT ACATCCGGTC AGTATGTTGC GTTAAAAACA 954 0 

GTCGCATTGT CTCTGGGGTT ACCGGTTGTG ACCGCCATTA CCGATCGTCC GGTACAGCGC 9600 

TGTACATTGA TTACAGCTCA GTGAATCAGC GCTTTCTGGC TTTTCGTCGG TCATTCTGTC 9660 

AACGCCACGA TGTTTGACCG TTATGGGGAT GCGGACGATT CCCTGCACAG CGTTGTTTCA 97 2 0 

CGGTGGTGGA TGACGCAACA CCGCTGTTAA AAACAGTCGT TCAGTCCTTT GTGTTACCGG 97 80 

TTGTGACAAC AATCAGTTGG TAATGGACGT GTGAACCATC TGCGCTTCCG TTGATTTTTA 98 4 0 

TGGACTGATA AAGTTTTGCC AGCTGAATCT TTATACGGAA TGCTGTTCAG TATGCGTACA 9900 

CGAATTGACT ATGTGGCGGA TAAATACTCT TTTACCGAAC GGAATGAATC TCCACGCGTT 9960 

CGCCGGGAGT GGCAGGATGT TCTGGAGGAG TGTCGGCTGA CAGAGGCCGG ACCAGAAGAA 10020 
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CGGCTGCGTA TTGCCCTGCT GAATGTGGAT TACGTCACCA GTTTTGAACT GCCTTTTCGC 10080 

TTGTTGCTTA CTCGTACACG ACAACTGATT GCCGCGCTTC GGGAAGAATG GGGCCTCAGC 1014 0 

CAGAAAAATG TGGTGTTCAA CGATAAACGG TTTGGCTGCG TGT ACAGCCT GAAGGCCAGT 10200 

CTTTCTGGTG TACCGGATAC ATTCCGGTAT CATCTGTCTC ATCGTATTCG CCGGATGGTT 102 60 

GGGAATGAAA ATACATCATC GCCATATCAG CAGATTGCCC GGGAAGTGAA AGTGCCCCGT 10320 

GAACGGCTGA AGTATGCGCT GGAAGCCGGT TTACTGGTGA CTGCACTGGA CGGGCTGTTC 10380 

TGGTCTGGTA GTCAGCGCAT TGCGGCTGAT ATCCTGAGAC TGAGAAAGAG CGGAATGCCG 10440 

GTGGTGACAA CGTCCGTGGA AGCGAGCGAT AACCTGACGG GAACAACCCG CAAAATACCG 10500 

GCATACCATC TCTGACATTG CGATGAAGGG CAGATTTCAG CTTGACAGGG GCAGAGTGCC 10560 

GCTTTTTATA CTTTATTCCC GTGTCTGAAA AAAATGTGCA AAGGAAACGG GAATGGCAAG 10620 

GTCCGATTAC GATTTTATCA ATCTGTCTCT GGGACATGAA CTGAATGAGT GGCTGGCAGA 10680 

GAGAGGTTAT GCCGGACAGG CGGATAACCG GAACCGACTG GCAGAGGTGG TTACCCGCAA 10740 

ATTGCGGGAC AGTTTTTATG CGGACGTCTC CTGGGATGCG CTGAATGTGG CATACAGTGA 10800 

ACACCCTGAG TGGTTTTCAG AGCTTGCCTC CGGGGATGAG GATTAACAGG CAAATTATGC 108 60 

TGCTATCGGG CAGAGTGATT ACCTGCAGGG ATTTCCATTT ATAAGAATAC GCCGCTTCGG 10 920 

GAAAGCTCCG GTTCTCCGGA GAGTTACGAT TATTTTTACT CAAATTCACA ACACCTGAAC 10980 

TGGAACTTGC GTTGTGTCCC GGATTGTTAC TCCGCAGAAG CATCCTTTTT ACCATACGGA 11040 

TGTTTGTTTT CCATTTCCCC TCCGAAAAAT ACAACTCCGA TCACATTTCT GATATTTTCC 11100 

CCGGATTTTA CATAACAGGA TTGTTTCTGT ATGTTTTTTA TCTGGTGTAA ATTTCAGCAC 11160 

TGACATTCCG CTTACGTTAA TTTACACTGG ATACCCCACG AGGAGAATAT GCAGCACCGG 112 2 0 

CAGGATAACT TACTGGCGAA CAGAAATTTG TTGCCTGGTA TGGTTTCCGG TCAGTACGCA 1128 0 

TTCAGGATCC GTACCTTATC TCAGGTGGTA CGCTATTTTT CCCTCCTCCC CTGCCTTTGC 1134 0 

ATTCTTTCAT TTTCGTCTCC GGCAGCCATG CTGTCTCCGG GTGACCGCAG TGCAATTCAG 114 00 

CAGCAACAGC AGCAGTTGTT GGATGAAAAC CAGCGCCAGC GTGATGCGCT GGAGCGCAGT 114 60 

GCGCCGCTGA CCATCACGCC GTCTCCGGAA ACGTCTGCCG GTACTGAAGG TCCCTGCTTT 11520 

ACGGTGTCAC GCATTGTTGT CAGTGGGGCC ACCCGACTGA CGTCTGCAGA AACCGACAGA 11580 

CTGGTGGCAC CGTGGGTGAA TCAGTGTCTG AATATCACGG GACTGACCGC GGTCACGGAT 11640 

GCCGTGACGG ACGGGTATAT ACGCCGGGGA TATATCACCA GCCGGGCCTT TCTGACAGAG 11700 

CAGGACCTTT CAGGGGGCGT ACTGCACATA ACGGTCATGG AAGGCAGGCT GCAGCAAATC 11760 

CGGGCGGAAG GCGCTGACCT TCCTGCCCGC ACCCTGAA3A TGGTTTTCCC GGGAATGGAG 11820 

GGGAAGGTTC TGAACTGCGG GATATTGAGC AGGGGATGGA GCAGATTAAT CGTCTGCGTA 11830 

CGGAGCCGGT ACAGATTGAA ATATCGCCCG GTGACCGTGA GGGATGGTCG GTGGTGACAC 11940 
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TGACGGCATT 


GCCGGAATGG 


CCTGTCACAG 


GG AoCGTGGG 


CATCGACAAC 


AGCGoGCAGA 


12000 


AGAATACCGG 


TACGGGGCAG 


TTAAATGGTG 


TCCTTTCGTT 


TAATAATCCT 


CTGGGGCTGG 


12060 


CTGACAAGTG 


GTTTGTCAGC 


GGGGGACGGA 


GCAGTGACTT 


TTCGGTGTCA 


CATGATGCGA 


12120 


GGAATTTTGC 


GGCCGGTGTC 


AGTCTGCCGT 


ATGGCTATAC 


CCTGGTGGAT 


TACACGTATT 


1218 0 


CATGGAGTGA 


CTACCTCAGC 


ACCATTGATA 


ACCGGGGCTG 


GCGGTGGCGT 


TCCACGGGAG 


12240 


ACCTGCAGAG 


TCACCGGCTG 


GGACTGTCGC 


ATGTCCTGTT 


CCGTAACGGG 


GACATGAAGA 


12300 


CAGCACTGAC 


CGGAGGTCTG 


CAGCACCGCA 


TTATTCACAA 


TTATCTGGAT 


GATGTTCTGC 


12360 


TTCAGGGCAG 


CAGCGGTAAA 


CTCACTTCAT 


TTTCTGTCGG 


GCTGAATCAC 


ACACACAAGT 


12420 


TTCTGGGTGG 


TGTCGGAACA 


CTGAATCCGG 


TATTCACACG 


GGGGATGCCC 


TGGTTCGGCG 


12480 


CAGAAAGCGA 


CCACGGGAAA 


AGGGGAGACC 


TGCCCGTAAA 


TCAGTTCCGG 


AAATGGTCGG 


12540 


TGAGTGCCAG 


TTTTCAGCGC 


CCCGTCACGG 


ACAGGGTGTG 


GTGGCTGACC 


AGCGCTTATG 


12600 


CCCAGTGGTC 


ACCGGACCGT 


CTTCATGGTG 


TGGAACAACT 


GAGCCTCGGG 


GGTGAGAGTT 


12660 


CAGTGCGTGG 


CTTTAAGGAG 


CAGTATATCT 


CCGGTAATAA 


CGGCGGTTAT 


CTGCGAAATG 


12720 


AGCTGTCCTG 


GTCTCTGTTC 


TCCCTGCCAT 


ATGTGGGGAC 


AGTCCGTGCA 


GTGACTGCAC 


12780 


TGGACGGCGG 


CTGGCTGCAC 


TCTGACAGAG 


ATGACCCGTA 


CTCGTCCGGC 


ACGCTGTGGG 


12840 


GTGCTGCTGC 


CGGGCTCAGC 


ACCACCAGTG 


GTCATGTTTC 


CGGTTCGTTC 


ACTGCCGGAC 


12900 


TGCCTCTGGT 


TTACCCGGAC 


TGGCTTGCCC 


CTGACCATCT 


CACGGTTTAC 


TGGCGCGTTG 


12960 


CCGTCGCGTT 


TTAAGGGATT 


ATTACCATGC 


ATCAGCCTCC 


CGTTCGCTTC 


ACTTACCGCC 


13020 


TGCTGAGTTA 


CCTTATCAGT 


ACGATTATCG 


CCGGGCAGCC 


GTTGTTACCG 


GCTGTGGGGG 


13080 


CCGTCATCAC 


CCCACAAAAC 


GGGGCTGGAA 


TGGATAAAGC 


GGCAAATGGT 


GTGCCGGTCG 


13140 


TGAACATTGC 


CACGCCGAAC 


GGGGCCGGGA 


TTTCGCATAA 


CCGGTTTACG 


GATTACAACG 


13200 


TCGGGAAGGA 


AGGGCTGATT 


CTCAATAATG 


CCACCGGTAA 


GCTTAATCCG 


ACGCAGCTTG 


13260 


GTGGACTGAT 


ACAGAATAAC 


CCGAACCTGA 


AAGCGGGCGG 


GGAAGCGAAG 


GGTATCATCA 


13320 


ACGAAGTGAC 


CGGCGGTAAC 


CGTTCACTGT 


TGCAGGGCTA 


TACGGAAGTG 


GCCGGCAAAG 


13380 


CGGCGAATGT 


GATGGTTGCC 


AACCCGTATG 


GTATCACCTG 


TGACGGCTGT 


GGTTTTATCA 


13440 


ACACGCCGCA 


CGCGACGCTC 


ACCACAGGCA 


AACCTGTGAT 


GAATGCCGAC 


GGCAGCCTGC 


13500 
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GACGGCACCC 


13 5 60 


GGAGCGATGC 


CGTATCCATT 


ATTGCCCGTG 


CAACGGAAGT 


GAATGCCGCG 


CTTCATGCGA 


13620 


AGGATTTAAC 


TGTCACTGCA 


GGCGGTAACC 


GGATAACTGC 


AGATGGTCGC 


GTCAGTGCCC 


13680 


TGAAGGGCGA 


AGGTGATGTG 


CCGAAAGTTG 


CCGTTGATAC 


CGGCGCGCTC 


GGTGGAATGT 


13740 


ACGCCAGGCG 


TATTCATCTG 


ACCTCGACTG 


AAAGTGGTGT 


CGGGGTTAAT 


CTGGGTAACC 


13800 


TTTATGCCCG 


CGAGGGCGAT 


ATCATAGTGA 


GCAGT GCCGG 


AAAACTGGTG 


CTGAAGAACA 


13860 
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GCCTTGCCGG CGGCAATACC ACCGTAACCG GAAGGGATGT CTCACTTTCA GGGGATAACA 13 920 

AAGCCGGAGG AAATCTCAGC GTTACCGGGA CAACGGGACT GACACTGAAT CAGCCCCGTC 13 980 

TGGTGACGGA TAAAAATCTG GTGCTGTCTT CATCCGGGCA GATTGTACAG AACGGTGGTG 14 04 0 

AACTGACTGC CGGACAGAAC GCCATGCTCA GTGCACAGCA CCTGAACCAG AGTTCCGGGA 14100 

CCGTGAATGC AGCTGAAAAT GTCACCCTTA CCACCACCAA TGATACCACA CTGAAAGGGC 14160 

GCAGCGTTGC CGGGAAAACA CTCACTGTCA GTTCCGGCAG CCTGAACAAC GGTGGGACAC 14 220 

TGGTTGCCGG GCGCGATGCC ACGGTGAAAA CCGGGACATT CAGTAATACC GGTACCGTCC 14 280 

AGGGGAATGG CCTGAAAGTT ACCGCCACTG ACCTGACCAG CACCGGCAGT ATTAAAAGTG 14 340 

GCAGCACACT CGATATCAGC GCCCGCAATG CCACACTGTC CGGTGATGCC GGTGCAAAAG 14 400 

ACAGTGCCCG CGTTACCGTC AGCGGTACAC TCGAAAACCG CGGGAGACTT GTCAGCGATG 14 4 60 

AGGTGCTGAC GCTCAGTGGC ACGCAGATAA ACAAGAGCGG TACCCTCTCC GGGGCAAAGG 14 520 

AACTTGTGGC TTCTGCAGAC ACACTGACCA CCACAGAAAA ATCGGTCACA AACAGTGACG 14 580 

GTAACCTCAT GCTGGACAGC GCGTCTTCCA CACTGGCGGG TGAAACCAGT GCGGGTGGCA 14 640 

CGGTGTCTGT AAAAGGCAAC AGTCTGAAGA CCACGACCAC TGCGCAGACG CAGGGCAACA 14700 

GTGTCAGCGT GGATGTGCAG AACGCACAGC TTGACGGAAC ACAGGCTGCC AGAGACATCC 147 60 

TTACCCTGAA CGCCAGTGAA AAGCTCACCC ACAGCGGGAA AAGCAGTGCC CCGTCGCTCA 14 820 

GGCTCAGTGC GCCGGAACTG ACCAGCAGCG GCGTACTTGT TGGTTCCGCC GTGAATACAC 14880 

AGTCACAGAC CCTGACCAAC AGCGGTCTGT TGCAGGGGGA GGCCTCACTC ACCGTTAACA 14 940 

CACAGAGGCT TGATAATCAG CAGAACGGCA CGCTGTACAG TGCTGCAGAC CTGACGCTGG 15000 
ATATACCGGA CATCCGCAAC AGGGGGCTTA TCACCGGTGA TAATGGTTTA ATGTTAAATG 15060 
CTGTCTCCCT CAGCAATCCG GGAAAAATCA TCGCTGACAC GCTGAGCGTG AGGGCGACCA 15120 
CGCTGGATGG TGACGGCCTG TTGCAGGGCG CCGGTGCACT GGGGCTTGCT GGCGACACCC 15180 
TCTCACAGGG TAGTCACGGA CGCTGGCTGA CGGCGGACGA CCTCTCCCTC CGGGGCAAAA 15240 
CACTGAATAC CGCAGGACCA CGCAGGGACA GAATATCACC GTGCAGGCGG ACAGATGGGC 15300 
GAACAGTGGT TCCGTGCTGG CAACCGGTAA CCTTACTGCT TCGGCAACCG GTCAGTTGAC 15360 
CAGTACCGGG GATATCATGA GCCAGGGTGA CACCACGCTG AAAGCAGCCA CCACGGACAA 154 20 

CCGGGGCAGT CTGCTTTCGG CCGGCACGCT CTCCCTTGAT GGAAACTGAC TGGATAACAG 154 80 

CGGCACTGTC CAGGGTGACC ATGTCACGAT TCGCCAGAAC AGTGTCACCA ACAGTGGCAG 15 540 

GCTCACCGGG ATCGCCGCGC TGACGCTTGC CGCCCGTATG GTATCCCCTC AACCTGCGCT 15600 
GATGAATAAC GGAGGTTCAT TGCTGACCAG CGGGGATCTG ACAATCACCG CAGGCAGTCT 15 660 

GGTAAACAGC GGGGCGATCC AGGCGGCTGA CAGCCTGACT GCACGTCTGA CGGGTGAGCT 15720* 
CGTCAGCAGA GCGGGCAGCA AAGTCACCTC GAAGGGTGAA ATGGCGCTCA GTGCACTGAA 15780 
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TTTAAGCAAC AGCGGACAAT GGATTGCAAA AAATCTGACC CT3AAGGCGA ACTCACTGAC 1584 0 

CAGTGCGGGT GACATCACCG GTGTGGATAC TCTCACGCTC ACGGTGAATC AGACGCTGAA 15900 

CAATCAGGCG AACGGAAAAC TGCTCAGTGC AGGTGTGCTG ACGCTGAAGG CAGACAGTGT 15960 

CACAAACGAC GGGCAATTAC AGGGAAATGC CACCACCATC ACGGCAGGAC AACTCACAAA 1602 0 

CGGCGGGCAT CTGCAGGGCG AAACGCTGAC GCTGGCCGGC TCCGGTGGCG TGAACAACCG 1608 0 

TTCCGGTGGT GTTCTGATGA GCCGGAATGC ACTGAATGTC AGTACTGCGA CCCTGAGTAA 1614 0 

CCAGGGCACG ATACAGGGTG GTGGCGGGGT TTCCCTGAAC GCCACTGACC GTCTGCAGAA 16200 

CGACGGCAAA ATCCTCTCCG GCAGTAACCT CACGCTGACG GCGCAGGTGC TGGCGAACAC 162 60 

CGGCAGCGGA CTGGTACAGG CTGCCACCCT GCTGCTGGAT GTGGTGAATA CTGTCAACGG 16320 

CGGACGCGTA CTTGCCACCG GCAGTGCCGA CGTTAAAGGA ACCACGCTGA ATAATACCGG 16380 

TACGCTTCAG GGTGCGGACC TGCTGGTGAA TTACCACACA TTCAGCAACA GCGGTACCCT 16440 

GCTGGGAACC TCCGGGCTTG GCGTCAAGGG CAGTTCACTG CTGCAAAATG GTACAGGGCG 16500 

GCTGTACAGT GCAGGCAACC TGCTGCTTGA CGCTCAGGAC TTCAGTGGTC AGGGGCAGGT 16560 

GGTGGCCACC GGTGATGTCA CACTGAAACT GATTGCTGCC CTCACGAATT ACGGTACCCT 16620 

GGCCGCAGGG AAAACCCTTT CCGTCACGTC GCAAAATGCC ATCACCAACG GCGGTGTCAT 16680 

GCAGGGTGAT GCCATGGTGC TCGGTGCCGG AGAGGCATTC ACCAACAATG GAACGCTGAC 16740 

TGCCGGTAAA GGCAACAGTG TTTTCAGCGC ACAGCGTCTT TTCCTTAACG CACCGGGTTC 16800 

ACTTCAGGCC GGTGGCGATG TGAGTCTGAA CAGCCGGAGT GATATCACCA TCAGTGGTTT 168 60 

TACCGGCACG GCAGGCAGTC TGACAATGAA TGTGGCCGGT ACCCTGCTGA ACAGTGCGCT 16 920 

GATTTATGCG GGGAATAACC TGAAGCTGTT TACAGACCGT CTGCATAACC AGCATGGTGA 16 980 

TATCCTGGCC GGCAACAGTC TGTGGGTACA GAAGGATGCT TCCGGCGGTG CAAACACAGA 1704 0 

GAT TAT CAAT ACTTCCGGGA ATATTGAGAC GCATCAGGGC GATATTGTTG TAAGAACCGG 17100 

GCATCTTCTG AACCAGCGGG AGGGATTTTC TGCCACAACA ACAACCCGGA CTAACCCCTC 17160 

ATCCATTCAG GGAATGGGAA ATGCTCTGGT TGATATTCCC CTTTCCCTTC TTCCTGACGG 17220 

CAGCTATGGC TATTTCACCC GTGAAGTTGA AAATCAGCAC GGTACGCCCT GCAACGGGCA 17280 

CGGGGCATGC AATATCACAA TGGATACGCT TTATTATTAC GCTCCGTTTG CTGACAGTGC 1734 0 

CACACAGCGC TTTCTCAGCA GCCAGAACAT CACAACAGTA ACCGGTGCTG ATAATCCGGC 174 00 

AGGCCGCATT GCGTCAGGGC GTAATCTTTC TGCTGAGGCT GAACGACTGG AAAACCGGGC 17 4 60 

GTCATTTATC CTGGCGAATG GGGATATCGC ACTCTCGGGC AGAGAGTTAA GCAATCAGAG 17520 

CTGGCAGACG GGGACAGAGA ATGAATATCT GGTATACCGC TACGACCCGA AAACGTTTTA 17 580 

CGGTAGCTAT GCAACAGGCT CTCTGGATAA ACTGCCCCTG CTGTCACCGG AATTTGAAAA 17 64 0 

CAATACCATC AGATTTTCAC TGGATGGCCG GGAAAAAGAT TACACGCCCG GTAAGACGTA 17700 
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TTATTCCGTT ATTCAGGCGG GCGGGGATGT TAAGACCCGT TTTACCAGCA GTATCAATAA 177 60 

CGGAACAACC ACTGCACATG CAGGTAGTGT CAGTCCGGTG GTCTCTGCAC CTGTACTGAA 17 820 

TACGTTAAGT CAGCAGACCG GCGGAGACAG TCTGACACAG ACAGCGCTGC AGCAGTATGA 17 880 

GCCGGTGGTG GTTGGCTCTC CGCAATGGCA CGATGAACTG GCAGGTGCCC TGAAAAATAT 17 94 0 

TGCCGGAGGT TCGCCACTGA CCGGTCAGAC CGGTATCAGT GATGACTGGG CACTGCCTTC 18000 

CGGCAACAAT GGATACCTGG TTCCGTCCAC GGACCCGGAC AGTCCGTATC TGATTACGGT 18060 

GAACCCGAAA CTGGATGGTC TCGGACAGGT GGACAGCCAT TTGTTTGCCG GACTGTATGA 18120 

GCTTCTTGGA GCGAAACCGG GTCAGGCGCC ACGTGAAACG GCTCCGTCGT ATACCGATGA 18180 

AAAACAGTTT CTGGGCTCAT CGTATTTTCT TGACCGCCTC GGGCTGAAAC CGGAAAAAGA 1824 0 

TTATCGTTTC CTGGGGGATG CGGTCTTTGA TACCCGGTAT GTCAGTAACG CGGTGCTGAG 18 300 

CCGGACGGGT TCACGTTATC TCAACGGACT GGGTTCAGAC ACGGAACAGA TGCGGTATCT 18360 

GATGGATAAC GCGGCCAGAC AACAGAAAGG ACTGGGATTA GAGTTTGGTG TGGCGCTGAC 18 4 20 

AGCTGAACAG ATTGCTCAGC TTGACGGCAG CATGCTGTGG TGGGAGTCAG TCACCATCAA 184 80 

CGGACAGACA GTCATGGTCC CGAAACTGTA TCTGTCGCCG GAAGATATCA CCCTGCATAA 18 540 

CGGCAGCGTT ATCAGCGGGA ACAACGTGCA GCTTGCGGAC GGCAATATCA CCAACAGCGG 18 600 

CGGCAGCATC AACGCACAGA ACGACCTTTC GCTCGACAGT ACGGGCTATA TCGACAACCT 18 660 

GAATGCAGGG CTGATAAGCG CGGGCGGTAG CCTGGACCTG AGCGCCATCG GGGATATCAG 187 20 

CAATATCAGC TCAGTCATCA GCGGTAAAAC CGTACAACTG GAAAGCGTGA GTGGCAACAT 1878 0 

CAGCAATATC ACCCGGCGTC AGCAATGGAA TGCGGGCAGT GACAGCCGAT ATGGTGGTGT 18840 

GCATCTCAGC GGTACGGACA CCGGTCCGGT TGCGACCATT AAAGGCACTG ATTCACTTTC 18900 

ACTGGATGCA GGGAAAAACA TTGATATTAC CGGGGCAACG GTCTCGTCCG GTGGAGACCT 18 960 

TGGAATGTCT GCGGGTAATG ACATCAACAT TGCCGTAAAC CTGATAAGCG GGAGCAAAAG 19020 

TCAGTCCGGT TTCTGGCACA CTGATGACAA CAGTTCATCA TCCACCACCT CACAGGGCAG 1908 0 

CAGCATCAGC GCCGGCGGTA ACCTGGCGAT GGCTGCAGGC CATAATCTGG AT G T C AC AG C 1914 0 
ATCCTCTGTT TCTGCCGGGC ACAGCGCCCT GCTTTCTGCA GGTAACGACC TGAGTCTGAA 192 00 

TGCAGTCAGG GAAAGCAAAA ACAGTCGCAA CGGCAGGTCA GAAAGTCATG AAAGCCACGC 192 60 

AGCTGTGTCC ACGGTGACGG CGGGCGATAA CCTCCTCCTT GTTGCCGGTC GTGATATTGC 19320 
CAGTCAGGCT GCCGGTATGG CTGCGGAAAA TAACGTGGTC ATCCGGGGCG GACGTGATGT 19380 
GAACCTGGTG GCAGAGTCTG CCGGCGCAGG CGACAGCTAT ACGTCGAAGA AAAAGAAAGA 19440 
GATTAACGAG ACAGTCCGTC AGCAGGGAAC GGAAATCGCC AGCGGTGGTG ACACCACCGT 19500 
CACCGCAGGA CGGGATATCA CCGCTGTTGC GTCATCCGTT ACCGCAACCG GCAATATCAG 195 60* 

CGTGAATGCC GGTCGTGATG TTGCCCTGAC CACGGCGACA GAAAGTGACT ATCACTATCT 19620 



JSDOCID <WO 9822575A3 IA> 



WO 98/22575 



PCT/US97/21347 



-114- 



GGAAACGAAG 


AAAAAAAGCG 


GAGGTTTTCT 


CAGTAAGAAA 


ACCACCCACA 


CCATCAGTGA 


19680 


GGACAGTGCC 


TCCCGTGAAG 


CAGGTTCCCT 


GCTGTCGGGG 


AACCGCGTGA 


CCGTTAACGC 


1 9740 


CGGTGATAAN 


CTGACGGTAG 


AGGGTTCGGA 


TGTGGTGGCT 


GACCGGGATG 


TGTCACTGGC 


19800 


GGCGGGTAAC 


CATGTTGATG 


TTCTTGCTGC 


CACCAGTACA 


GATACGTCCT 


GGCGCTTTAA 


19860 


GGAAACGAAG 


AAATCCGGTC 


TGATGGGTAC 


CGGCGGTATT 


GGTTTCACCA 


TTGGCAGCAG 


19920 


TAAGACAACG 


CACGACCGCC 


GCGAGGCSGG 


GACAACGCAG 


AGTCAGAGTG 


CCAGTACCAT 


19980 


CGGCTCCACT 


GCCGGTAATG 


TCAGTATTAC 


CGCGGGCAAA 


CAGGCTCATA 


TCAGCGGTTC 


20040 


GGATGTGATT 


GCGAACCGGG 


ATATCAGCAT 


TACCGGTGAC 


AGTGTGGTGG 


TTGACCCGGG 


20100 


GCATGATCGT 


CGTACTGTGG 


ACGAAAAATT 


TGAGCAGAAG 


AAAAGCGGGC 


TGACGGTTGC 


20160 


CCTTTCCGGC 


ACGNTGGGCA 


GTGCCATCAA 


TAATGCGGTC 


ACCAGTGCAC 


AGGAGACGAA 


20220 


GGAGAGCAGT 


GACAGCCGTC 


TGAAAGCCCT 


GCAGGCCACA 


AAGACAGCGC 


TGTCTGGTGT 


20280 


GCAGGCCGGA 


CAGGCTGCGG 


CAATGGCCAC 


CGCAACCGGT 


GACCCGAATG 


CGACGGGAGT 


20340 


CAGCCTGTCG 


CTTACCACCC 


AGAAATCGAA 


ATCACAACAA 


CATTCTGAAA 


GTGACACAGT 


20400 


ATCCGGCAGT 


ACGCTGAATG 


CCGGGAATAA 


TCTGTCTGTT 


GTCGCAACCG 


G C AAAAAC AG 


20460 


GGGAGATAAC 


CGCGGAGATA 


TTGTGATTGC 


AGGAAGCCAG 


CTTAAGGCCG 


GTGGTAACAC 


20520 


AAGCCTGGAT 


GCCGCGAATG 


ATGTTCTGTT 


GAGTGGCGCT 


GCAAACACAC 


AAAAAACAAC 


20580 


GGGCAGGAAC 


AGCAGCAGTG 


GCGGTGGCGT 


GGGTGTCAGT 


ATCGGTGCCG 


GTGGTAACGG 


20640 


TGCCGGTATC 


AGCGTCTTTG 


CCAGCGTTAA 


TGCGGCAAAA 


GGCAGCGAGA 


AAGGTAACGG 


20700 


TACTGAGTGG 


ACTGAAACCA 


CAACAGACAG 


CGGTAAAACC 


GTCACCATCA 


ACAGTGGTCG 


20760 


GGATACGGTA 


CTGAACGGTG 


CTCAGGTCAA 


CGGCAACAGG 


ATTATCGCCG 


ATGTGGGCCA 


20820 


CGACCTGCTG 


ATAAGCAGCC 


AGCAGGACAC 


CAGTAAGTAC 


GACAGTAAAC 


AGACCAGCGT 


20880 


GGCTGCCGGC 


GGCAGTTTTA 


CCTTTGGCTC 


CATGACCGGC 


TCAGGTTACA 


TCGCTGCCTC 


20940 


CCGGGATAAG 


ATGAAGAGCC 


GCTTTGACTC 


CGTTGCTGAA 


CAAACCGGGA 


TGTTTTCCGG 


21000 


AGATGGCGGC 


TTCGATATCA 


CGGTCGGCAA 


CCACACCCAG 


CTCGATGGTG 


CGGTTATCGC 


21060 


TTCCACGGCG 


ACGGCAGATA 


AAAACAGCCT 


CGATACCGGG 


ACGCTCGGCT 


TCAGCGATAT 


21120 


TCACAACGAA 


GCGGATTATA 


AAGTCAGTCA 


CAGTGGAATC 


AGTCTGAGCG 


GTGGTGGCAG 


21180 






CT A AC ATGCP 


GGGTGGC AT G 


AT ATCCGrCG 


GAGGTCACAG 


w- X C *» Vj> 


CGGACATGCG 


GAAGGAACGA 


CTCAGGCCGC 


AGTGGCAGAT 


GGCACAATCA 


CCATCCGGGA 


21300 


CAGGGACAAT 


CAGAAGCAGA 


ATCTGGCGAA 


CCTGAGCCGT 


GACCCTGCGC 


ACGCTAATGA 


21360 


CAGTATCAGC 


CCGATATTTG 


ACAAGGAGAA 


AGAGCAGAGG 


CGTCTGCAGA 


CAGTGGGGCT 


21420 


TATCAGTGAC 


ATTGGCAGTC 


AGGTGGCGGA 


TATCGCGCGG 


ACGCAGGGGG 


AACTGAATGC 


21480- 


GTTGAAGCTG 


CGCAGGATAA 


ATATGGGCCT 


GTTCCGGCGG 


ATGCGACGGA 


AGAACAGCGG 


21540 
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CAGGCATATC TGGCAAAACT GCGTGATACG CCGGAATACA AAAAGGAACA GGAAAAGTAT 21600 

GGTACCGGCA GCGATATGCA GCGCGGTATC CAGGCTGCAA CGGCTGCACT TCAGGGCCTG 21660 

GTGGGCGGCA ATATGGCAGG CGCGCTGGCA GGTGCTTCAG CGCCGGAGCT GGCGAACATC 217 20 

ATCGGTCATC ACGCGGGTAT TGATGACAAT ACAGCGGCAA AAGCCATTGC CCATGCCATT 21780 

CTCGGTGGTG TGACAGCAGC CCTTCAGGGC AACAGTGCGG CAGCAGGCGC AATTGGTGCG 2184 0 

GGTACTGGTG AAGTGATCGC GTCAGCCATT GCGAAAAGCC TCTACCCGGG CGTAGATCCG 21900 

TCGAAACTGA CAGAAGATCA GAAGCAAACT GTAAGCACGC TGGCAACGCT GTCAGCGGGT 21960 

ATGGCCGGCG GCATTGCCAG TGGCGATGTG GCTGGCGCGG CTGCTGGAGC TGGTGCCGGG 22020 

AAGAACGTTG TTGAGAATAA TGCGCTGAGT CTGGTTGCCA GAGGCTGTGC GGTCGCAGCA 22080 

CGTTGCAGGA CTAAAGTTGC AGAGCAGTTG CTAGAAATCG GGGCGAAAGC GGGCATGGCC 2214 0 

GGGCTTGCCG GGGCGGCAGT CAAGGATATG GCCGACAGGA TGACCTCCGA TGAACTGGAG 22200 

CATCTGATTA CGCTGCAAAT GATGGGTAAT GATGAGATCA CTACTAAGTA TCTCAGTTCG 2 22 60 

TTGCATGATA AGTACGGTTC CGGGGCTGCC TCGAATCCGA ATATCGGTAA AGATCTGACC 22320 

GATGCGGAAA AAGTAGAACT GGGCGGTTCC GGCTCAGGAA CCGSTACACC ACCACCATCG 22380 

GAAAATGATC CTAAGCAGCA AAATGAAAAA ACTGTAGATA AGCTTAATCA GAAGCAAGAA 224 4 0 

AGTGCGATTA AGAAGATCGA TAACACTATA AAAAATGCTC TGAAAGATCA TGATATTATT 2 2 500 

GGAACTCTCA AGGATATGGA TGGTAAGCCA GTTCCTAAAG AGAATGGAGG ATATTGGGAT 22560 

GATATGCAGG AAATGCAAAA TACGCTCAGA GGATTAAGAA ATCATGCGGA TACGTTGAAA 22 62 0 

AACGTCAACA ATCCTGAAGC TCAGGCTGCG TATGGCAGAG CAACAGATGC T 2 2 671 
(2) I N FORMATION FOR SEQ ID NO: 15: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2385 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GGGCGACACG GAAATGTTGA ATACTCATAC TCTTCCTTTT TCAATATTAT TGAAGCATTT 60 

ATCAGGGTTA TTGTCTCATG AGCGGATACA TATTTGAATG TATTTAGGCA ACTGAAACCC 120 

GCTGACGGAT NANGTGTACA GTGGCATCAG TGGACGGMTT ACAGCATAAG TGCTTAAGGC 180 

GCGTGACCAT ACAGMTACGG TCGCTGCAGA GAACAGGGAG AATATCATCC GGAACACGGT 24 0 

GGCCATAAAC CGTAACACCA GGGGGCTGCT TTCCCCGGGA GAGGTGCTGG AGATGCATGC 300 

GGACGTCTGA ACAGTCAGCA GGGCTGATTA ATGAGAATCA CGAGGAAATG AAGCGGGAGC 360' 

CGTACAGTGA GGATAAATTT AACGCCATAG CGGCTGTGGG CGGGTATAGT GCCAAGCAGA 4 20 
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CTGCTTAAAG GCAGGTACTA CTTTCAGTGG CGGCTATGTT TGCTGGAATG TGGGTGTCAA 4 80 

CTGGTAGTTC TGAACCCGGG CCTGAGTCAC CGGGGAGGCA GTTTTCGGTA TGAAGTAATG 54 0 

ATTCGCTGCC TGTTTTTCTC GGGGATGGCA TAAGTGACTG TTCCCGGGTA TTCGTGAAGA 600 

TCTGAGAGGA AGAGTGTATA TGCTGAACTA TCGCATAAGG TCAGTGCAGC TATTTATTGT 660 

AAACGGTCGG GCTGACAGGG CGCAGGTGCG TCTGGAATGC GACGATGAAG CCGTTTTTGA 720 

ATGTTATCTT CTTGCTGAAG GGGAAGGGGA ACTGAAAGAA GTGAGCCTGT CAGAGCTGGA 7 80 

AGAGCGGGCG CTGATGTATG CGGCAGACAG TTTCCGTTAT GAATGATAAG TCAGTTATAC 84 0 

CGGTAATGGT AAACGGAGCC GGTATCCGGG ATACAAGGGG GAGAGAGTAT GCTGATTATT 900 

ATTATGACCC GGGACAGATA TCTGGAATAT GGCCTGATGC GTATACTGAG CGGATATCAG 960 

GTCACGACAG GCAGAGAGCT GTTTAATGCC GGAAAGCAAC GTCAGTCACT TCCCGAAGAC 1020 

AGTTATGTGA TTCTCTGTGA CCGTAATCTG GAAAGGCTTA CATACTCTAT GTTCTGTGGG 108 0 

CGTCGGTTTC TTGTCATTCC TGTTTCCTCT GTGAGATGCC TGACAGATAT CAGGCAAACC 1140 

ATCCGCCGTG GAGCGTGGCT GTTCGGACAT ACGGCAAGGC CACTGACCCG GACAGAGATG 1200 

GTGGTGGTCT TCGGGGTTGT TTTCCATGAC TACGGGTTTA CCTTTCTGGC AGACCGGCTG 1260 

GGGATAACCA TGAAGACGGT ATGTGCGCAT CTTTACAATG CGATGGAGAA AAATGGTATG 1320 

CGCGGCGTCA GTATTAAATA TCTCTGCAAC ACCATAGACC GGTAAAAAGA TGGTTTTCTG 138 0 

ATAAAGGCTG TTGCGACGGG GATTTCTGTG CATGCTGTGT CACGGGCATC CCAGCTCTCC 14 40 

GGATAATTAA TGTTATGTAG TCAGGCGTGA TAAATTTCAT ATGGAACAGG TATGCGTTTT 1500 

ATTTGTGATA ACAGTTAATG AGGTGTTTCC ATACACACTG AAGTTACCTG TAATATTAGC 15 60 

GGGGGATTTG AATGATGTTG CGTGTCTGCG ACCACTCGTT TATTCATGCA AATAAGTGGA 1620 

CTGCTGGATC CACGGTAAGA GTACAGCGAG GGCCGTATTG ACGGGGATGT GTTATTCAGC 1680 

GGGCAGTGCT ATGCGCCACG GAAGCAGTTC GCTGACACGG TTGACCGGCC AGTCAGCTAT 1740 

GACGCCAAAC ACATGGCGAA GGTAGTTTTC TGGATCCTCG TCGTTCAGTT TGCACGTCCC 18 00 

GATCAGGCTG TACAGTAGCA CTCCCCGCTC ACCACCATGC TCAGAGCTGC GTATTACCGT 18 60 

GAAGGAGATC GGTGAGTAAC CCTCTGTGTC GGCACATTAT AGCCGTCACA TCGGATAACT 1920 

GTTATCCTTC TGTTCTGATG TATTCTGGGA GGTGATGTTT CACTCCTGAT AAGAGCATTA 1980 

CTAATTACAG CTGCTTTTGG GATAACATTC GGGCAGTTTT CTTTAATTCT GAAGTCTGAA 2 04 0 

AGAGATATCA GTAATTGTAT TGCTTTTAAA CATTGTCAGT ATTTATTTGT CCAAATCGTT 2100 

CACGTTTCTC ATAATCTTGC CGACAGTCAC CATCACAAAA CAATCCAGTC TTAACAGGTT 2160 

CTCCGCAGTT ATAGCAGAAT CCTGTTTCAG GGAGTCTATT CGGGATACGA TTTTTTAGTC 2220 

TGATGCTCAT GCTGAATTGT TCATTTTGAT AAGCAATATC TGCACTATCT GCCATAAACG 2280 

ATCCTCTGAG GAGACCACAT GTTTATAAGC CACCACCGAA AT ATT AC AAA GTAATACTCA 234 0 
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TTGTATAATC TTTAACCRGG GGCAGGATAA TTGTATCCTG CCCCT 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 746 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



CTTTCAGACC 


AGCGTTTCCT 


GTCAGGAGAT 


GAGGAAGAAA 


CATCAAAGTA 


TAAAGGCGGC 


60 


GATGACCATG 


ATACGGTATT 


CAGTGGCGGT 


ATTGCGGCCG 


GTTATGATTT 


TTATCCGCAG 


120 


TTCAGTATTC 


CGGTTCGTAC 


AGAACTGGAG 


TTTTACGCTC 


GTGGAAAAGC 


TGATTCGAAG 


180 


TATAACGTAG 


ATAAAGACAG 


CTGGTCAGGT 


GGTTACTGGC 


GTGATGACCT 


GAAGAATGAG 


240 


GTGTCAGTCA 


ACACACTAAT 


GCTGAATGCG 


TACTATGACT 


TCCGGAATGA 


CAGCGCATTC 


300 


ACACCATGGG 


TATCCGCAGG 


ATTGGCTACG 


CAGAATTCAC 


CAGAAAACAA 


CCGGTATCAG 


360 


TACCTGGGAT 


TATGAGTACG 


GAAGCAGTGG 


TCGCGAATCG 


TTGTCACGTT 


CAGGCTCTGC 


420 


TGACAACTTC 


GCATGGAGCC 


TTGGCGCGGG 


TGTCCGCTAT 


GACGTAACCC 


CGGATATCGC 


480 


TCTGGACCTC 


AGCTATCGCT 


ATCTTGATGC 


AGGTGACAGC 


AGTGTGAGTT 


ACAAGGACGA 


540 


GTGGGGCGAT 


AAATATAAGT 


CAGAAGTTGA 


TGTTAAAAGT 


CATGACATCA 


TGCTTGGTAT 


600 


GACTTATAAC 


TTCTGACGAC 


ACTGCTCCTG 


AACGATAATT 


GCGTATATTC 


TGTAATTAAG 


660 


ATAATTGCAT 


ATCKTCTGCA 


ATTAARCAGA 


AATACCCTGC 


AGTCTATTAC 


TGCAGGGNTG 


720 


TCTTTTATCT 


GTTTTACAGA 


NAATTT 








746 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 411 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

( D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 



TCTGTTTGTC 


GTTTTTTCCC 


CGTTGTAGCG 


GYTCTGCTCC 


TGGCTTCCCT 


GATAGTCAGC 


60 


CCGCAGGCGC 


CAGGGCCCCA 


GATTCCCCCC 


CACAGTCCCG 


TTATAACTGA 


ACTGATGAGA 


120 


GTCTCCTCCC 


TGATAATTAC 


GGGAAACCGT 


CCCGTTGAGG 


TTATAATCCA 


GCATCAGTCC 


180 


GGGAATGCCG 


TCGTCCCAGC 


GTGAGGGAGG 


CAGCCAGGTG 


GCATCAGAAT 


ACTCAAGCCC 


240 


AGCTGCGGCA 


TATTGATGCG 


TAATACGCCC 


GCTCCGGTAT 


CAGGACGAAT 


ATCCACTCCC 


300- 


GGCAACCCAT 


GAAAATCCGC 


ACACTGACCA 


TCATGCCAGT 


AAACAACTTT 


ATCCAGAGAT 


360 
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TCTGCTGTTA ACCCCATCAG TCTGACCATA TCTGATGTCA GACAGGCCTG C 4 11 

(2) INFORMATION FOR SEQ ID NO: 18: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 977 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

TATTATCGCG CGCGCGCTGC ACAGGGGTTA TCTACATCTG CTGCTGCTGC CGGTTTAATT 60 

GCTTCTGTAG TGACATTAGC AATTAGTCCC CTCTCATTCC TGTCCATTGC CGATAAGTTT 120 

AAACGTGCAA ATAAAATAGA GGAGTATTCA CAACGATTCA AAAAACTTGG ATACGATGGT 180 

GACAGTTTAC TTGCTGCTTT CCACAAAGAA ACAGGAGCTA TTGATGCATC ATTAACAACG 24 0 

ATAAGCACTG TACTGGCTTC AGTATCTTCA GGTATTAGTG CTGCKGCAAC GACATCTCTT 300 

GTTGGTGCAC CGGTAAGCGC ACTGGTAGGT GCTGTTACGG GGATAATTTC AGGTATCCTT 3 60 

GAGGCTTCAA AGCAGGCAAT GTTTGAACAT GTTGCCAGTA AAATGGCTGA TGTTATTGCT 4 20 

GAATGGGAGA AAAAACACGG T AAAAAT T AC TTTGAAAATG GATATGATGC CCGCCATGCT 4 80 

GCATTTTTAG AAGATAACTT TAAAATATTA TCTCAGTATA ATAAAGAGTA TTCTGTTGAA 54 0 

AGATCAGTCC TCATTACTCA ACAACATTGG GATATGCTGA TAGGTGAGTT AGCTAGTGTC 600 

ACCAGAAATG GAGACAAGAC ACTCAGTGGT AAAAGTTATA TTGACTATTA TGAAGAGGGA 6 60 

AAGCGGCTGG AAAGAAGGCC AAAAGAGTTC CAGCAACAAA TCTTTGATCC ATTAAAAGGA 720 

AATATTGACC TTTCTGACAG CAAATCTTCT ACGTTATTGA AATTTGTTAC GCCATTGTTA 7 80 

ACTCCCGGTG AGGAAATTCG TGAAAGGAGG CAGTCCGGAA AATATGAATA TATTACCGAG 84 0 

TTATTAGTCA AGGGTGTTGA TAAATGGACG GTGAAGGGGG TTCAGGACAA GGGGTCTGTA 900 

TATGATTACT CTAACCTGAT TCAGCATGCA TCAGTCGGTA ATAACCAGTA TCGGGNAATT 960 

CGTATTGAGT CACACCT 97 7 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
TTTCTTAAGT CCGGCATTGC CACGCGTAAC CCCCACTTCA ACCGCATGAT TGAGCAGATC 60* 
GAAAAAGTGG CGATCAAATC CCGCGCGCCG ATTCTGCTTA ACGGTCCAAC CGGCGCGGGC 120 
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AAGTCATTTC TGGCGCGACG CATCTTAGAG TTAAAACAGG CGCGGCATCA GTTTAGCGGC 
GCKTTTGTGG AAGTGAACTG CGCCACCCTG CGCGGCGATA CCGCCATGTC GACGCTGTTT 
GGTCATGTAA AAGGCGCGTT TACCGGGGCG CGGGAATCTC GTGAAGGTTT ATTACGCAGC 
GCCAACGGGG AAATGTTGTT TCTTGATGAG ATTGGCGAAC TGGGCGCGAC GAACAGGCAA 
TGCTGCTGAA ACCCATTGAA GRGGAAAACC TTTTACCCGT 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12368 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GTATGCGTTT TCATTAAGAT ATTCTCTGCT G T AG AG AAAC TTATAGCAAT ATAATCTGAT 
AATATCTTTT ATGTAAAATT TAAATAGTTC ACCTGTGACA GATATATGTT TTCTGCTCAG 
TAACTCCTGT GTATTAAGCC ATTCCCGTGA CCGAAGCACA CCCTTGTGAA AACTTTTTCT 
TACTTGCTTT GAGGCACGGC ATTGATGTAA TATTTTTGCG TCCTCAATAA TTCTCTTTCC 
CGTTTTATTT TTTGCAGCAT CTCTTACTCC AT AAAAT AT C TCCCGGTCCA GACTTTTGTC 
ATATTTACTG ATTATACGAC AAATATTCCT GACCCGACGA TTCTCTTTAT TTCGCTTCCA 
TAGCTTATAA TGATCATCGC ATAACCTTAA GGCATTTGCC TCATCAAATT CTGAAACAGG 
ATTACTGCAT TTTTTATTCC GACAAATACC TTTGTTTTTA GCCATACTCT TCTTCCCGTC 
AATGGAAAAA TTTTCACACC CATATTACCT GAATGATAAA CCGGATTAGT GTGATCCGGT 
TCAGTGAAAT CAACAGGATA CCGGTATGCC ATTCAGCAAT TCTTCCCTCT CCGCGCAAGT 
GAAATCATAT CTGACGTTTC TTCCTGAAGA AATACGCCAG AAAATCCTTG AACATCTCCA 
CGGTGTTATT CATTACGAGC CCGTGATTGG CATTATGGGT AAATCCGGCA CCGGCAAGAG 
CAGCCTGTGT AATGCCATTT TTCAGTCCCG TATCTGCGCC ACGCATCCCC TGAACGGCTG 
CACCCGCCAG GCTCATCGTC TTACCCTGCA GCTCGGTGAA CGCAGAATGA CGCTGGTCGA 
TCTGCCCGGC ATTGGTGAAA CACCGCAGCA TGATCAGGAA TACCGAGCGC TTTATCGTCA 
GTTACTGCCG GAACTGGATC TGATTATCTG GATCCTGCGG AGTGATGAAC GTGCGTATGC 
TGCCGATATT GCCATGCATC AGTTTTTACT GAATGAGGGC GCAGATCCCT CGCGCTTTCT 
GTTTGTTCTC AGCCATGCCG ATCGCATGTT TGCTGCTGAA GAATGGAATG CCACAGAAAA 
ATGCCCGTCC CGTCACCAGG AACTCTCACT GGCGACAGTA ATAGCCCGGG TGGCCACCCT 
GTTCCCTTCA TCATTTCCGG TACTCCCTGT AGCCGCACCT GCAGGCTGGA ACCTTCCAGC 
GCTGGTGTCA CTGATGATCC ACGCGCTGCC ACCACAGGCA ACCAGCGCAG TTTATTCACA 



180 
240 
300 
360 
400 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 ' 
1260 
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TATCAGGGGG 


GAAAACCGCT 


CTGAACAGGC 


CC GGAAACAC 


GCACAACAGA 


CTTTTGGTGA 


1320 


TGCCATCGGG 


AAAAGTTTTG 


ACGACGCCGT 


T3CCCGGTTC 


AGTTTTCCGG 


CCTGGATGTT 


1380 


ACAGCTTCTG 


CGTAAAGCCC 


GGGACCGCAT 


TATCCACCTG 


CTGATCACAC 


TGTGGGAGCG 


1440 


TCTGTTCTGA 


CACACTGACG 


CCGACAGATG 


TGTCGCTGGA 


TTAACGAGCA 


TTCTTCTTTT 


1500 


TATGAAATCA 


TGCTTAAAAA 


TCAGATAATT 


ARAAGAATAT 


TTTTTCTGCT 


GCATTTTATT 


15G0 


CCTGATTATC 


CGGATGCGAC 


ACATCCTTTC 


AACATCATGA 


TGCATAATAA 


CATCATGAAA 


1620 


TAAAAGATGT 


TTTCTTACGG 


AGTGCACATC 


TATGTCTGAT 


AATCGTTCCC 


GGCATGATCG 


1680 


CCTGGCGGTT 


CGCTTATCAC 


TCATTATCAG 


CCGACTGATG 


GCCGGAGAAT 


CTCTGTCACT 


1740 


AAAAACACTG 


TCAGATGAAT 


TTGGCGTTAC 


AGAACGTACT 


TTACAGCGCG 


ATTTTCATCA 


1800 


GCGTCTGGTT 


CACCTAGATT 


TAGAGTACAG 


AAATGGCAGG 


TACAGCCTCA 


GACGACAGAG 


1860 


CAGCCCAGGT 


GCGATCCCTG 


AAATGCTTTC 


TTTTATACAG 


AATACCGGGA 


TCGCACGGAT 


1920 


ACTTCCGCTC 


CGGAACGGAC 


GACTGATAAC 


CTGTCTTACC 


GACAACCAGG 


AGCCCTCTCC 


1980 


CTGCCTTATC 


TGGCTACCGG 


CGCCGGATAT 


CACTGCAACG 


TTCCCCGAGT 


GTTTCTCGCA 


2040 


ACTCATCCTG 


GCAATAAGAC 


AGTGTATCCA 


CATCTCTCTG 


ATGACTGAGC 


GATGGTATCC 


2100 


GTCACTGGAG 


CCCTGCCGGC 


TCATTTATTA 


CAGCGGTAGC 


TGGTATCTGA 


TCGCGTTACA 


2160 


GAAGGGAAAA 


CTGCAGGTCT 


TTCCTCTGGC 


AGATATCAAA 


TCAGTCAGCC 


TGACATCAGA 


2220 


ACGGTTTGAA 


CGGAGAGGCC 


ACATCCACAG 


TCTGGTCGCT 


GAAGAGCGTT 


TTATCTCCGC 


2280 


CCTGCCACAT 


TTCTCTTTCA 


TCCATAAACT 


TATCAACACC 


TTTAACCTGT 


GATCGCCGGC 


2340 


CTGCCAAAGC 


CGTCCCGACA 


GGTATGGAGA 


CAATATGTTG 


AACAGAAAAC 


TAAATATACG 


2400 


GCTACGTCAT 


TCCCTGAACA 


GTCACTGCAT 


ACCTTCCATC 


ATTATCAATA 


ACACCGTACG 


2460 


TTCATTTCAG 


AGGTCAGTCA 


TGAATACCAG 


AGCTCTTTTT 


CCCCTGCTGT 


TCACTGTGGC 


2520 


ATCATTCTCC 


GCCTCCGCCG 


GCAACTGGGC 


TGTCAAAAAC 


GGCTGGTGTC 


AGACCATGAC 


2580 


GGAAGATGGT 


CAGGCGCTGG 


TAATGCTGAA 


AAATGGCACG 


ATTGGTATTA 


CCGGCCTGAT 


2640 


GCAGGGATGC 


CCGAATGGTG 


TACAGACGCT 


CCTGGGCAGC 


CGTATCAGTA 


TTAACGGTAA 


2700 


CCTGATCCCC 


AC AT C AC AAA 


TGTGTAATCA 


GCAGACGGGA 


TTCAGGGCTG 


TTGAGGTGGA 


2760 


AATCGGACAG 


GCGCCGGAAA 


TGGTCAAAAA 


AGCCGTTCAC 


TCCATAGCAG 


AGCGTGATGT 


2820 


GTCCGTTTTA 


CAGGCATTTG 

*i V_J \J *i X M v — J 


GTGTACGAAT 


GGAATTCACC 


CGCGGTGATA 


TGCTGAAGGT 


2880 


CTGTCCGAAA 


TTTGTCACAT 


CACTTGCCGG 


TTTTTCCCCG 


AAACAGACGA 


CCACTATTAA 


2940 


TAAAGATTGC 


GTCCTGCAGG 


CTGCCCGGCA 


GGCATACGCC 


CGGGAATATG 


ACGAGGAAAC 


3000 


AACAGAAACC 


GCTGATTTTG 


GCTCTTACGA 


AGTAAAAGGC 


AATAAGGTTG 


AGTTTGAAGT 


3060 


ATTCAATCCT 


GAAGACCGTG 


CGTACGACAA 


AGTGACCGTC 


ACGGTTGGTG 


CTGACGGTAA 


3120- 


TGCCACCGGC 


GCCAGCGTTG 


AATTTATCGG 


AAAATAGCCG 


GTATGTCGGA 


CTGCCACCCT 


3180 
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GTTTTATTGC CCGAAGGCCC TTTCTCACGC GAACAGGCGA TGGCTGTCAC AACAGCTTAC 32 4 0 

CGCAATGTGC TTATTGAAGA TGACCAGGGA ACGCATTTCC GGCTGGTTAT CCGCAATGCC 3 300 

GAAGGGCAGC TACGCTGGCG GTGCTGGAAT TTTGAACCTG ATGCCGGAAA ACAGCTAAAT 3 3 60 

TCGTATCTCG CCAGTGAGG G AATTCTCAGG CAATAAACGT CTTCATTTCA TCCATCAGGC 34 20 

CGCGTCTTCT CCGGGAGACG CGGCCTTTTC GTTTATACCG CTAATTCATT CATAAGGAGC 3 4 80 

AAAGTATGCA ATTAGCCAGT CGTTTTGGTC ATGTAAATCA GATCCGTCGG GAGCGCCCAC 354 0 

TGACACGCGA AGAACTGATG TACCACGTCC CGAGTATTTT TGGAGAAGAC CGGCACACCT 3 600 

CCCGCAGTGA ACGGTATGCG TACATTCCCA CCATCACCGT CCTGGAAAAT CTGCAGCGGG 36 60 

AAGGCTTTCA GCCGTKCTTC GCCTGCCAGA CCCGTGTGCG CGACCAGAGC CGCCGGGAAT 37 20 

ATACCAAACA TATGCTGCGT CTGCGGCGGG CCGGACAGAT AACCGGTCAG CATGTGCCTG 37 80 

AAATTATTCT GCTCAACTCC CATGACGGTT CATCCAGCTA CCAGATGTTA CCGGGATATT 38 4 0 

TTCGTGCCAT TTGTACCAAT GGCCTGGTCT GCGGTCAGTC GCTGGGAGAA GTCCGGGTGC 3900 

CACACCGGGG AAACGTGGTG GACAGGGTCA TAGAAGGTGC TTACGAAGTG GTGGGCGTGT 3 960 

TTGACCTGAT TGAGGAAAAG CGTGATGCCA TGCAGTCGCT GGTCCTGCCG CCACCGGCAC 4 020 

GCCAGGCGCT GGCACAGGCG GCGCTGACTT ACCGTTATGG TGATGAACAT CAGCCCGTCA 4 080 

CCACTACCGA CATTCTGACG CCACGACGCC GGGAGGATTA CGGTAAGGAC CTGTGGAGTG 4140 

CTTATCAGAC CATCCAGGAG AATATGCTGA AAGGCGGGAT TTCCGGTCGC AGTGCCAGAG 4 200 

GAAAACGTAT CCATACCCGG GCCATTCACA GCATCGATAC CGACATTAAG CTCAACCGGG 4 2 60 

CGTTGTGGGT GATGGCAGAA ACGCTGCTGG AGAGCCTGCG CTGATACCGT TTCCCTGAAA 4 320 

GCGCAGTCCT GTTCACGGCT GTCCCTTCCC CCAGACATTC CACCATTCAT TTACTTTTTA 4 380 

TAAGGAATAA TCTCATGACA ACCTCTTCGC ATAATTCCAC CACACCTTCT GTTTCCGTGG 4 4 40 

CCGCTGCATC AGGGAATAAC CAGTCTCAGT TGGTTGCCAC TCCCGTCCCT GATGAACAGC 4 500 
GCATCAGCTT CTGGCCGCAG CATTTTGGCC TCATTCCACA GTGGGTCACC CTGGAGCCCC 4 560 

GTGTCTTCGG CTGGATGGAC CGTCTGTGCG AAAACTACTG CGGGGGTATC TGGAATCTGT 4 620 

ACACCCTGAA CAACGGTGGC GCATTTATAG CACCTGAACC GGATGAAGAT GATGGAGAAA 4 680 

CCTGGATACT GTTCAATGCC ATGAACGGTA ACCGCGCTGA AATGAGCCCG GAAGCTGCCG 4 7 40 

GCATTGCCGC CTGTCTGATG AGGTACAGCC ATCATGCCTG TCGTACGGAG AATTATGCCA 4 800 

TGACGGTCCA TTATTACCGG TTGCGGGATT ACGCCCTGCA GCATCCGGAA TGCAGCGCCA 4 8 60 

TTATGCGCAT CATTGACTGA AAGGGGCCGG AATAATGCAA CAGATTTCCT TTCTGCCCGG 4 920 

AGAAATGACG CCCGGCGAGC GCAGTCACAT TCTGCGGGCC CTGAAAACCC TGGACCGCCA 4 980 

TCTTCATGAA CCCGGTGTGG CCTTCACCTC CACCGGTGCG GCACGGGAAT GGCTGATTCT 5040" 
GAACATGGCG GGACTGGAGC GTGAAGAGTT CCGGGTGCTG TATCTGAATA ACCAGAATCA 5100 
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GCTGATTGCC GGTGAAACCC TCTTCACCGG CACCATCAAG CGCACGGAAG TCCATCCCCG 51 CO 

GGAAGTGATT AAACGCGCCC TGTACCACAA TGCCGCTGCG GTGGTGCTGG CGCACAATCA 5220 

CCCGTCCGGT GAAGTCACAC CCAGTAAGGC AGACCGGCTT ATCACCGAAC GTCTGGTACA 528 0 

GGCACTGGGC CTGGTGGATA TCCGGGTGCC GGACCATCTG ATAGTCGGTG GCAGCCAGGT 534 0 

TTTCTCCTTT GCGGAACACG GTCTGCTTTA ACGCGTCACC GTCACAATCA CCTTCATATC 54 00 

ACTTCAGTTT CTCTTTCTCA GCTGTTTCTT ACTTTCACAT TCAGGAGGAC TATTCTCATG 54 60 

AAAATCATGA CCCGTGGTGA AGCCATGCGT ATTCACCGTC AGCATCCTGC ATGCCGTCTT 5520 

TTTCCGTTCT GTACCGGTAA ATACCGCTGG CACGGTAGCA CGGATACATA TACCGGCCGT 55&0 

GAAGTACAGG ATATTCCCGG TGTGGTGGCT GTGTTTGCTG AACGCCGTAA GGACAGTTTT 5 64 0 

GGCCCGThT?- TCCGGCTGAT GAGCGTCACC CTGAACTGAA TCAGGACGGG CATTCAGAAG 5700 

AGCAGAATTA TCGCCACCAC CGGACCATTC TTAACCAATT TTCTGTGAGG ATTTTATCGT 57 60 

GTCAGACACT CTCCCCGGGA CAACGCATCC CGACGATAAC AACGACCGCC CCTGGTGGGG 582 0 

GCTACCCTGC ACCGTGACGC CCTGTTTTGG GGCACGTCTG GTGCAGGAGG GTAACCGGTT 5880 

GCATTACCTT GCAGACCGCG CCGGTATCAG AGGCCGGTTC AGCGACGCGG ATGCGTACCA 594 0 

TCTGGACCAG GCCTTTCCGC TGCTGATGAA ACAACTGGAA CTCATGCTCA CCAGCGGTRA 6000 

ACTGAATCCC CGCCATCAGC ATACCGTCAC GCTGTATGCA AAAAGGCTGA CCTGCGAANC 60 60 

GACACCCTCG GCAGTTGTGG CTACGTTTAT ATGGCTGTTT ATCCGACGCG CGAAACGAAA 6120 

AAGTAACTCT CCAGAATAAC CTTCTGCCCC GGCCTGGTGC TTTCACCACG CCACTTTTCC 6180 

ATTTTTCATC TCTGCATATC AGGAAAATCT TCAGTATGAA CACATTACCC GATACACACA 62 4 0 

TACGGGAGGC ATCGCATTGC CAGTCTCCCG TCACCATCTG GCAGACACTG CTCACCCGAC 6300 

TGCTGGACCA GCATTACGGC GTCACACTGA ATGACACACC GTTCGCTGAT GAACGTGTGA 63 60 

TTGAGCAGCA TATTGAGGCA GGCATTTCAC TGTGTGATGC GGTGAACTTT CTCGTTGAAA 64 20 

AATACGCACT GGTGCGTACC GACCAGCCGG GATTCAGCGC CTGTACTCGT TCTCAGTTAA 64 80 

TAAACAGTAT TGATATCCTC CGGGCCCGCC GGGCAACCGG CCTGATGGCC CGCGACAATT 65 4 0 

ACAGAACGGT AAATAACATT ACCCTGGGTA AGCATCCGGA GAAACGATGA AACTTTCCCT 6600 

GATGCTGGAA GCCGACAGAA TTAATGTGCA GGCACTGAAC ATGGGGCGAA TTGTCGTTGA 6660 

CGTCGATGGT GTTAATCTCA CTGAACTGAT TAACAAGGTC GCTGAAAACG GTTATTCACT 6720 

CCGCGTGGTG GAGGAATCCG ACCAACAGTC AACCTGCACA CTACCACCGT TTGCAACCCT 67 8 0 

TGCCGGCATA CGCTGCAGTA CCGCACATAT CACGGAAAAG GATAACGCCT GGCTGTACTG 68 4 0 

GCTGTCACAC CAGACCAGTG ACTTCGGTGA ATCAGAATGG ATTCATTTCA CAGGTAGCGG 6900 

ATATCTGTTA CGTACCGATG CGTGGTCATA TCCGGTTCTG CGGGTTAAAC GCCTGGGGCT 6960' 

GTCAAAAACG TTCCGTGGTC TGGTTATCAC ACTTACCCGA CGTTATGGCG TCAGTCTCAT 7020 
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TCATCTGGAT GCCAGCGCTG AATGCCTGCC GGGTTTACCC ACTTTCAACT GGTAACCAGG 
AACAACATGA AATCATTAAC CACGGAAACC GCACTGGATA TTCTGATTGC GTGGCTGCAG 
GACAATATCG ACTGCGAATC GGGAATTATC TTTGACAACA ATGAGGATAA AACGGATTCA 
GCAGCACTGT TGCCCTGTAT CGAACAGGCC AGAGAGGATA TCCGTACCCT GCGCCAACTG 
CAGCTTCAGC ACCAGAACCG GTGAGTCTCA CTCATCATCT CACTCACCAG ACTTCATTCC 
ACTSACGCCA GCCTGAACAC GGCTGGCGTT TTCATTTATC TGCAAAAAGG AATATCGATT 
ATGTCTGAAA TCACAGTCTC CCGTCCGGAA GTGGTCAACG AGAATACGGA CGTTATCTGC 
TCCACCTCAG TCAGGTACAG GTCACTGGAA TATGATAATT TTCCGGAAAT CAGCGAAGCG 
AACATTCTGA GCACATTTGA ACAACTGCAC CAGAACAAAG ATGAAGTGTT TGAACGGGGA 
GTGATCAACG TCTTCAAAGG GCTGAGCTGG GAT T AC AAAA CCAACTCACC CTGTAAATTT 
GGCAGTAAAA TTATCGTCAA CAATCTGGTG AGATGGGACC AGTGGGGATT TCATCTTATC 7 680 

AGTGGAATGC AGGCAGATCG CCTGGCTGAC CTGGAAAGAA TGTTGCATCT GCTCAGCGGT 
AAACCGATCC CCGACAACCG AGGGAATATC ACCATTAATC TGGATGACCA CATACAGTCC 
GTTCAGGGTA AAGGACGCTA TGAAGATGAG ATGTTCATCA TTAAATACTT TAAGAAGGGA 
TCTGCACACA TCACTTTCAA AAGGCTGGAG CTGATTGACA GAAT T AACGA TATAATAGCC 
AGGCACTTTC CTTCTGTGCT CTCAGCCTGA CCCCGAGTTT GATTCCCTTT CGATATCAAA 
AGGGACTGCG GGTACAAAAG AGGGTACATC TTTCACCAAA CCAAACAAAA TAAACTAATA 
TCAACATGAT AGAAGCATTC TTCGATTCCG AGTCCGGCAC CAAATTCATA TAAACGGACC 
TCCACGGAGG TCCGTTTTTC GTTTCAGGAC GCCACGATTT AAGCGTCCTG CCGCCAAATC 
AATTCTACCG AACTCAACCA GATTCTCCCC ACATCACCAG CAATTTGCGG GCATATCCCA 
ATTCGGGAAA ATTTGTTTCT GAGCTATAGC GCTGACTGAC GTGAAATGTC GTGCGGCCCC 
GTGATGCTGT TGAAMGTCAA ATGACGTCAT CAGGAGCGTA ACGCACCCAT AAAGCACAAC 
ATCGGGCAGA ACGCCAACTG ATGAGATTTT CTGAATGAGA ACAAAGAGAA ATGTATCAGT 84 00 

CCGTTTGCTC ATGCAAAGAC TAACAATCCA TTAAAATAGT AAGCGCTCCG GACAATTTTC 
CATGGATTAT TTTCTGAACA TTTTTCTTTG GCAAAGATGA TGAATTTTGA TGGTAAGGAA 
AATTACTTCT GGTTCTCAGT AAAATCCTTT CGTAATACTA TGTAATCAAG AAGTTTATGG 
CTAGTAAAAA TAACGTCTTG CATTCACCAA TAATATGTAA ATAAACCCAT CTATAGATGG 
AAAAAATAGG TTATGGAATT ATCATTGCAT GATTCCCTTT TCGAATGAGT TTCTATTATG 
CAACAACCTG TAGTTCGCGT TGGCGAATGG CTTGTTACTC CGTCCATAAA CCAAATTAGC 
CGCAATGGGC GTCAACTTAC CCTTGAGCCG AGATTAATCG ATCTTCTGGT TTTCTTTGCT 
CAACACAGTG GCGAAGTACT TAGCAGGGAT GAACTTATCG ATAATGTCTG GAAGAGAAGT 
ATTGTCACCA ATCACGTTGT GACGCAGAGT ATCTCAGAAC TACGTAAGTC ATTAAAAGAT 
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AATGATGAAG ATAGTCCTGT CTATATCGCT ACTGTACCAA AGCGCGGCTA TAAATTAATG 9000 

GTGCCGGTTA TGTGGTACAG CGAAGAAGAG GGAGAGGAAA TAATGCTATC TTGGCCTCCG 90 6 0 

CCTATACCAG AGGCGGTTCC TGCCACAGAT TCTCCCTCCC ACAGTCTTAA CATTGAAAAC 912 0 

ACCACAACGC CACCTGAACA ATCCCCAGTT AAAAGCAAAC GATTCACTAC CTTTTGGGTA 9180 

TGGTTTTTTT TCCTGTTGTC GTTAGGTATC TGTGTAGCAC TGGTAGCGTT TTCAAGTCTT 924 0 

GAAACAGGTC TTCCTATGAG TAAATCGCGC ATTTTGGTCA ATCCACGCGA TATTGACATT 9300 

AATATGGTTA ATAAGAGTTG TAACAGGTGG AGTTCTCCGT ATCAGCTCTC TTACGCGATA 9360 

GGCGTGGGTG ATTTGGTGGC GACATGACTT AACACCTTGT CCACCTTTAT GGTGCATGAC 94 2 0 

AAAATCAACT ACAACATTGA TGAACCGAGC AGTTCCGGTA AAACATTATC TATTGCGTTT 94 80 

GTTAATCAGC GCCAATACCG TGCTCAACAA TGGTTTATGT CGGTAAAATT GGTAGACAAT 954 0 

GCAGATGGTT CAACCATGCT GGATAAACGT TATGTCATCA CTAACGGTAA TGAGCTGGCG 9600 

ATTCAAAATG ATTTGCTCCA GAGTTTATCA AAAGGGTTAA ACCAACCGTG GCCACAACGA 9660 

ATGGAGGAGA TGCTCCAGCA AATTTTGCCG CATCGTGGTG CGTTATTAAC TAATTTTTAT 9720 

CAGGCACATG ATTATTTACT GCATGGTGAT GATAAATCAT TGGATGGTGC CAGTGAATTA 97 8 0 

TTAGGTGAGA TTGTTCAATC ATCCCCAGAA TTTACCTACG CGAGAGCAGA AAARGCATTR 98 4 0 

GTTGRTATCG TGCGCCATTC TCAACATGCT TTAGACGRAA AACAATTAGC CAGCACTGAA 9900 

CACAGAAATA GATAACATTG TTACAGTGCC GGAATTGAAC AACCTGTCCA TTATATATCA 9960 

AATAAAAGGG GTCAGTGCCC TGGTAAAAGG TAAAACAGAT GAGTCTTATC AGGCGATAAA 1002 0 

TACCGGCATT GATCTTGAAA TGTCCTGGCT AAATTATGTG TTGCTTGGCA AGGTTTATGA 10080 

AATGAAGGGG ATGAACCGGG AAGCAGCTGA TGCATATCTC ACCGCCTTTA ATTTACGCCC 1014 0 

AGGGGCAAAC ACCCTTTACT GG AT TGAAAA TGGTATATTC GAGAGTTCTG TTCCTTATGT 10200 

TGTACCTTAT CTCGACAAAT TTCKCGCTTC AGAATAAGTA ACTCCCGGGT TGATTCATGC 10260 

TCGGGAATAT TTGTTGTTGA GTTTTTGTAT GTTCCCGTTG GTATAATATG GTTCGGCAAT 10320 

TTATTTGCCG CATAATTTTT ATTACATAAA TTTAACCAGA GAATGTCACG CAATGCATTG 10380 

TAAACATTGA ATGTTTATCT TTTCATGATA TCAACTTGCG ATCCTGATGT GTTAATAAAA 10440 

AACCTCAAGT TCTCACTTAC AGAAACTTTT GTGTTATTTC ACCTAATCTT TAGGATTAAT 10500 

CCTTTTTTCG TGAGTAATCT TAGCGCCAGT TTGGTCTGGT CAGGAAATAG TTATACATCA 105 60 

TGACGCGGAC TCCAAATTCA AAAATGAAAT TAGGAGAAGA GCATGAGTTC TGCCAAGAAG 10 620 

ATCGGGCTAT TTGNCCTGTA CCGGTGTTGT TGCCGGTAAT ATGATGGGGA GCGGTATTGC 1068 0 

ATTATTACCT GCGAACCTAG CAAGTATCGG TGGTATTGCT ATCTGGGGTT GGATTATCTC 10740 

TATTATTGGT GCAATGTCGC TGGCATATGT ATATGCCCGA CTGGCAAGAA AAAACCCGCA 1080CT 

ACAAGGTGGG CCAATTGGGT ATGGCGGAGA AATTTCCCCT GCATTTGGTT TTCAGACAGG 108 60 
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TGTTCTTTAT 


TACCATGCTA 


ACTGGATTGG 


TAACCTGGCA 


ATTGGTATTA 


CCGCTGTATC 


10920 


TTATCTTTCC 


ACCTTCTTCC 


CAG T AT T AAA 


TGATCCTGTT 


CCGGCGGGTA 


TCGCTGTTAT 


10980 


TGCTATCGTC 


TGGGTATTTA 


CCTTTGTGAA 


TATGCTCGGC 


GGTACCTGGG 


TAAGCCGTTT 


11040 


AACCACGATT 


GGTCTGGTGC 


TGGTTCTTRK 


TCCTGTGGTG 


ATGACTGCTA 


TTGTTGGCTG 


11100 


GCATTGGTTT 


GATGCAGCAA 


CTTATGCAGC 


TAACTGGAAT 


ACTGCGGATA 


CCACTGATGG 


11160 


TCATGCGATC 


AT T AAAAG T A 


TTCTGCTCTG 


CCTGTGGGCC 


TTCGTGGGTG 


TTGAATCCGC 


11220 


AGCAGTAAGT 


ACTGGTATGG 


TTAAAAACCC 


GAAACGTACC 


GTTCCGCTGG 


CAACCATGCT 


11280 


GGGTACTGGT 


TTAGCAGGTA 


TTGTTTACAT 


CGCTGCGACT 


CAGGTGCTTT 


CCGGTATGTA 


11340 


TCCGTCTTCT 


GTAATGGCGG 


CTTCCGGTGC 


TCCGTTTGCA 


ATCAGTGCTT 


CAACTATCCT 


11400 


CGGTAACTGG 


GCTGCACCAC 


TGGTTTCTGC 


ATTCACCGCC 


TTTGCGTGTC 


TGACTTCTCT 


11460 


GGGCTCCTGG 


ATGATGTTGG 


TAGGCCAGGC 


AGGTGTACGT 


GCCGCTAACG 


ACGGTAACTT 


11520 


CCCGAAAGTT 


TATGGTGAAG 


TCGACAGCAA 


CGGTATTCCG 


AAAAAAGGTC 


TGCTGCTGGC 


11580 


TGCAGTGAAA 


ATGACTGCCC 


TGATGATCCT 


CATCACTCTG 


ATGAACTCTG 


CCGGTGGTAA 


11640 


AGCCTCTGAC 


CTGTTCGGTG 


AACTGACCGG 


TATCGCAGTA 


CTGCTGACTA 


TGCTGCCGTA 


11700 


CTTCTACTCT 


TGCGTTGACC 


TGATTCGTTT 


TGAAGGCGTT 


AACATCCGCA 


ACTTTGTCAG 


11760 


CCTGATCTGT 


TCTGTACTGG 


GTTGCGTGTT 


CTGCTTCATC 


GCGCTGATGG 


GCGCAAGCTC 


11820 


CTTCGAGCTG 


GCAGGTACCT 


TCATCGTCAG 


CCTGATTATC 


CTGATGTTCT 


ATGCTCGCAA 


11880 


AATGCACGAG 


CGCGAGAGCC 


ACTCAATGGA 


TAACCACACA 


GCGTCTAACG 


CACATTAATT 


11940 


AAAAGTATTT 


TCCGAGGCTC 


CTCCTTTCAT 


TTTGTCCCAT 


GTGTTGGGAG 


GGGCCTTTTT 


12000 


TACCTGGAGA 


TATGACTATG 


AACGTTATTG 


CAATATTGAA 


TCACATGGGG 


GTTTATTTTA 


12060 


AAGAAGAACC 


CATCCGTGAA 


CTTCATCGCG 


CGCTTGAACG 


TCTGAACTTC 


CAGATTGTTT 


12120 


ACCCGAACGA 


CCGTGACGAC 


TTATTAAAAC 


TGATCGAAAA 


CAATGCGCGT 


CTGTGCGGCG 


12180 


TTATTTTTGA 


CTGGGATAAA 


TATAATCTCG 


AGCTGTGCGA 


AGAAATTAGC 


AAAATGAACG 


12240 


AGAACCTGCC 


GTTGTACGCG 


TTCGCTAATA 


CGTATTCCAC 


TCTCGATGTA 


AGCCTGAATG 


12300 


ACTGCGTTTA 


CAGATTAGCT 


TCTTTGAATA 


TGCGCTGGGT 


GCTGCTGATG 


ATATTGCTAA 


12360 


CAAGATCC 












12368 



(2) INFORMATION FOR SEQ ID NO: 21: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 833 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
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GCACGGCACT CTGATGTANC TTTTATCTGT TCCGAGTGGA AGCATGCCCC ACAACTGAGT CO 

CATTAAGTGT GGAAGAACAG TTTTGTCCCC GCCTGCAATC TC7CCCTTTC NAAAAACCAG 120 

TATGTCGCCA TGCCTCGCCT TAATGGAGAG CGCTGAACCA TACCTTCTTT TTCCCAGTAA 180 

TAACAGGTAA TAGCGTGCCT GGTAATCCGT TACCGCCAGC GCCTCCGCAA TTTCTGCGGT 24 0 

TTTCCCTCCA TTATGCCTGT TCAGAAATYC CAGTATTTCA TTCTTCATAT ATTGACTCAT 300 

GTCAGTGTAA CAAAGTTYCT YCGAATAATA AAAATCATGC TTTCTGTTAT CAACGGAAAG 360 

GTATTTTTAT TCTCTGTGTT TGCTTTATTT GTGAAATTTA GTGAATTTGC TTTTTGTTGG 4 20 

CTTTATTTGN ATGTGTGTCA CATTTTGTGT GTTATTTTTC TGTGAAAAGA AAGTCCGTAA 4 80 

AAATGCATTT AGACGATCTT TTATGCTGTA AATTCAATTC ACCATGATGT TTTTATCTGA 54 0 

GTGCATTCTT TTTGTTGGTG TTTTATTCTA GTTTGATTTT GTTTTGTGGG TTAAAAGATC 600 

GTTTAAATCA ATATTTACAA CATAAAAAAC TAAATTTAAC TTATTGCGTG AAGAGTATTT 660 

CCGGGCCGGA AGCATATATG CAGGGGCCCG ACAGAAGGGG GAAACATGGC GCATCATGAA 720 

GTCATCAGTC GGTCAGGAAA TGCGTTTTTG CTGAATATAC GCGAGAGCGT AYTGTTGCCC 78 0 

GGCTMTATGT CTGAAATGCA TTTTTTTTTA CTGATAGGTA TTTCTTCTCA TTC 833 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2916 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

TGCACCATCA CTGATACCAC CGGGACCCCG GATTTTATCC GGTCCCCGCG GACTGACAGG 60 

GTTTGTGACA CCTGAGTCAT ATCCGATGTA AACTTCATTT TCACGGGTTG TACAGGAAAA 120 

CTCCCCTGTG CCATTGAGTT CTGATGTGTG CCCTTCGCCA CAACTCCCAC CGTCACGGCA 18 0 

CCAGTTGCAT CTGACGCCGA CCAACTGCTG AGAGCCATGC CGTTTCCGGC TTTGTCGACA 24 0 

ACGCATGCTG CAGTTCCCAG CGATGCGAAC TGGTCTGGCA TGCATTCACG AACCAACAGC 300 

AGTGGTGCTA CGTCCGGATG CAATTCGCAT GAGCTCCAAC CGCGGTTGTA AGTTCAGCAG 360 

CCCGGGCCTC TGCCCCCGGC ACAGTCGCAT AAGTATTCGA TACCGTGCGA CACCATTACC 4 20 

TTCAGGATAC GCCACGGACC CGTCACCCTA CGAAAACGCC GGAGCACCGG CAATCAGCAA 4 80 

AGGCAGCAGT GATAAAAGAC TGATATATTT CCTGTCATTA TTTTTCATAT TAATTTAACT 54 0 

CCTGATTAAC CGGTTTTTAT TGATATGAGA AAGTAATAGT TGCAATAGCC TTCACACTTC 600 

CAGGTGTAGT TGCATCAGCA ATTTTTATAT AATTGGCTCT TAAATTGATA TGTGGATTTA 660- 

CCTCTCCCCT GTAATCGGAG AAGTGCCATT GACTGCCATT TCCTTTCACA GGGGAGTCTT 720 
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CACCATAGCT GATGGCAGTT ACATCACTGT CTTTATATAG CCTGATGCCA AATCCTTTTG 
CAGTGGATTC ACTGCTTAAG GTCAATATAT CTGTTCTGTT CACTGGCTGT GATGCATCTG 
TCAATGTAGC ATAAACATCA ATTCCATCCG GGCATTGTAG GTGTATGTCA ATTTTACCTC 
CCTGTATTT2 TTTATACAAA GATGTGAACT GTGATTGATA TACGGTATTT AATGGCACCA 
CATAGTTTTT TTGCCCCATG GTACATGTCT GACTCTGTAC CTGAATGCGC CCACCATTTA 
ACATAACAGG TGCTGTCAGT CCTTTATTAT TTAAACTTGT ACGTTTTGCT TCCAACAAAA 
TAGTACCAAG CTGCCTGGTG GGTATTGTTA TATATCCATT GGGTAATCTT CCCGTTGCGA 114 0 

CAAAAGCAAC AAACAAACGA GCTCCGAAGC TTGCTGTCGC ACCGTTATAA GTATTGGGGT 1200 
TTGTATTGGC ACCTACAGGG TCAATATATA TACCTGAGCT ATTTATGGGG ACCAGAGGCG 12 60 

TTGCGGGCCA ATAGCCCGCC ATGCCAATAA TAATACCCAG TCCGGATACA CCAATATCAT 
AGATATCAAA ATCAGATGAA TCACGGCTGT TTCCTTGATG GAAAGTATAC GTAATACTTC 
CAATTTTAGG CAGTGCGGGT GTAAACTTTC CACGCATCAG AGCGATGGCA CCGCCATTAA 
AAACATACTG GTTACTTGTT CCCGCCAGCT CTCCTATCAC CCGGGGATAG GTATGGGCAT 
CAGCAGGACC AATCACAACA CCTGGCAATG TGGATGTATT AACCGCTATC TGCGAAGGCA 
CATAATCATC CGGACCCGCT ACCGCCAGCT TAGGGAGTAA AATTAAAAAC AATGGTATGA 
AAAAGATTCT TTTCATGTTT TTTCCTGATT AGGGTGCTGT ATACACAGAA CAGGAACGAG 
CTGAGATTGC ATATCATCTT TATTGTGTGC AACATGATAT ACAAATGAAC ATCTGTCTTT 
ATTATCTGGT CCCCATACAA CGCTGAGATG ACCTTTTTCA GGGAGTCCCC TGGTAAATAC 1800 
CTTCCCGGCC TGAGCGACAT ATCCGGCCAA CTGTCCATGT TCATCCAGAA CTTCAGAAGC 
CATTGGAGGG GGATTGCCAG TAGACATACG AATATCAAAT AACAGACTTC TTCCTGTTTT 
AGTGTCAAAT TTYACTAACG TGGCGCTATT AGCACGAGGA ATGATTTCCT GCTCCGTCGC 
CGATAATTCA ACATTCAAAT CTAAATTGGA GGGATCGATG CTAATTTGAT TTTTCTCATA 
GGGTGTAACA TAAGGAACAA TACCATTTCC CCAAAAATCC AGACGACTAC CAGAGGCATT 
ATTGATGGCA GCCCCCTGAG CTCCTTCAGC ATGGATAATG GCAAAAGTAT CACTCAGGTC 
ATTACTCAAT GTCACTCCAT AGGGGTGTGC GACCACCGCT CCCGACGCAC CAAATGACCT 
TTGATTATTA TTCTGAGTAT CATGCCCGAC TGTTGTGGTT ATATTTACAT AAGGTGAACG 
ATAACCCCCA TTCATTGCAT AACCGGAAGG CCCGTTTTCC TGGCTGTTTC CTGAAAGACC 
ATAAGAGAAC TGATTATCCT CCCCGCCAGT ACCACTAATT GATGTCTGAA TACTATTTTT 
CTCTTCTTTG CTATAATTTA AAACAGTGGA AAACACCGGG CTTTGAACAC TTNCCTCCCA 
GAGGGAGAGT AAAATTAATA TAAAATCTGT CATCACGGCG TTGTTGCTCA TTATCTCTTG 
ACTGAGACAA TCCAATTTGA TAGCCGAGTT GTTTCCAGAA GTTGCTGTAC CCCATCTGGT 
ATTCATTACG ACTTCCTTTA TGTCCCCAGT AATTATAGGT TGTTCCTGTT AAATACATCC 



780 
84 0 
900 
960 
1020 
1080 
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1380 
1440 
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1680 
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1920 
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2220 
2280 
2340 
2 4 00 
2460 
2520 
2580 ' 
2640 



JSDOCID <WO 982257SA3 IA> 



WO 98/22575 



PCT/US97/21347 



-128- 

CACCCCATTT TTCACCTAAT TCCTGGTTGA T7GAAATCTG G AA T T G A. T T C CTGGGACGAT 2700 

AAAACGCTGT ACTTTTTACA GAAACATCAT CAATAAACGC GTTGTGATTA GCTGATAGCG 27 60 

CATCCTTCAG ATGATAAAAA TGTTTTGATG AATAACGATA AGCCGCCAGA GTTATATTTG 2820 

TGTTTTGAGG GGTGGGAATA TTGGATGGCT AATAACTTGG AGTNGCAGGA CTAATAAACC 2880 

TTTTACGGCG GTTAGACCGG GAATACCNGG AAATGC 2 916 
(2) INFORMATION FOR SEQ ID NO: 23: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2677 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
ACCGCATCGC CAATCTCAGC GGCAGTGGTT TACATGTCTT CCGTGATGGA AGGTCATGGC 60 

ATCAGCTACC TCCATCTGCT CTCCGTGGTC ATCCCGTCCA CCCTGCTGGC GGTTCTGGTG 120 

ATGTCCTTCC TGGTCACTAT GCTGTTCAAC TCCAAACTCT CTGACGATCC GATTTATCGC 180 

AAGCGTCTGG AAGAGGGCCT GGTTGAACTG CGCGGTGAAA AGCAGATTGA AATCAAATCC 24 0 

GGTGCAAAAA CGTCCGTCTG GCTGTTCCTG CTGGGCGTAG TTGGCGTGGT TATCTATGCA 300 

ATCATCAACA GCCCAAGCAT GGGTCTGGTT GAAAAACCAC TGATGAACAC CACCAACGCA 3 60 

ATCCTGRTCA TCATGCTCAG CGTTGCAACT CTGACCACCG TTATCTGTRA ARTCGATACC 4 20 

GACAACATTC TCAAYTCCAG CACCTTCAAA GCAGGTATGA GCGCCTGTAT TTGTATCCTG 4 80 
GGTGTTGCGT GGCTGGGCGA TACTTTCGTT TCCAACAACA TCGACTGGAT CAAAGATACC 54 0 

GCTGGTGAAG TGATTCAGGG TCATCCGTGG CTGCTGGCCG TCATCTTCTT CTTTGCTTCT 600 
GCTCTGCTGT ACTCTCAGGC TGCAACCGCA AAAGCAYTGA TGCCGATGGC TCTGGCACTG 660 
AACGTTTCTC CGCTGACCGC TGTTGCTTCT TTTGCTGCGG TGTCTGGTCT GTTCATTCTG 720 
CCGACCTACC CGACACTGGT TGCTGCGGTA CAGATGGATG ACACGGGTAC TACCCGTATC 780 
GGTAAATTCG TCTTCAACCA TCCGTTCTTC ATCCCGGGTA CTCTGGGTGT TGCCCTGGCC 840 
GTTTGCTTCG GCTTCGTGCT GGGTAGCTTC ATGCTGTAAT GACCCATYGC GGGGCGTTCA 900 
CGCCCCGCTT TCTTTCCCGC CGACTAACAT CCTTTCCCCG TCCGTTGTAT AGTGACCTCT 960 

CTCTTGCGGT TCCATCTGTT CTTGCGAGGT GTTTATGCTT GATGAAAAAA GTTCGAATAC 1020 

CACGTCTGTC GTGGTGCTAT GTACGGCACC GGATGAAGCG ACAGCCCAGG ATTTAGCCGC 1080 

CAAAGTGCTG GCGGAAAAAC TGGCGGCCTG CGCGACCTTG ATCCCCGGCG CTACCTCTCT 114 0 

CTATTACTGG GAAGGTAAGC TGGAGCAAGA ATACGAATGC AGATGATTTT AAAAACTACC 12 00" 

GTATCTCACC AGCAGGCACT GMTGAATGCC TGAAGTCTCA TCATCCATAT CAAACCCCGG 12 60 
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AACTTCTGGT TTTACCTGTT ACACACGGAG ACACAGATTA CCTCTCATGG CTCAACGCAT 
CTTTACGCTG ATCCTGGTAC TTTGGAGCAC TTCCGTTTTT GCCGGATTAT TCGACGCGCC 
GGGACGTTCA GAATTTGTCC CCGCGGATCA AGCGTTTGCT TTTGATTTTC AGCAAAACCA 
ACATGACCTG AATCTGACCT GGCAGATGAA AGACGGTTAC TACCTCTACC GTAAACAGAT 
CCGCATTACG CCGGAACACG CGAAAATTGC CGACGTGCAG CTGCCGCAAG GCGTCTGGCA 
TGAAGATGAG TTTTAGGGGA AAAGCGAGAT TTACCGCGAT CGGCTGACGC TTCGCGTAAC 
CATCAACCAG GCGAGTGCGG GAGCAACGTT AACTGTCACC TACCAGGGCT GTGCTGATGG 
CGGTTTCTGT TATCCGCCAG AAACCAAAAC CGTTCCGTTA AGGGAAGTGG TCGCGAACAA 
CGAAGCGTCA CAGCCTGTGT CTGTTCCGCA GCAAGAGCAG CCCACCGCGC AATTGCCCTT 
TTCGGGGCTG TGGGCGTTGT TGATCGGTAT TGGTATCGCC TTTAGGCGAT GCGTGCTGCC 
AATGTACCCA CTGATTTCTG GCATCGTGCT GGGCGGTAAA CAGCGGCTTT CCACTGCCAG 
AGCATTGTTG CTGACCTTTA TTTATGTGCA GGGGATGGCG CTGACTTACA CGGCGCTGGG 
TCTGGTGGTT GCCGCCGCAG GKTTACAGTT CGAGGCGGCG CTACAGMAGC CATACGTGCT 
CATTGGCCTC GCCATCGTCT TTACYTTGCT GGCGATGTCA ATGTTTGGCT TKTTTACTCT 
GCAACTGCCG TCTTCGCTGC AAACACGTCT CAGGCTGATG AGCAATCGCC AACAGGGCGG 
CTCACCTGGC GGTGTGTTTA TTATGGGGGC GATTGCCGGA CTGATCTGTT CACCYTGCAC 
CACCGCACGG CTTAGCGGGA TTCTGCTGTA TATCGCCCAA AGCGGGAACA TGTGGCTGGG 
CAGCGGCACG CTTTATCTTT ATGCGCTGGG CATGGGCCTG GCGCTGATGC TAATTACCGT 
CTTTGGTAAC CGGTTGCTGC CGAAAAGCGG CCCGTGGATG GAACAAGTCA AAAGCGCGTT 
TGGTTTTGTG ATCCTCGCAC TGCCGGTCTT CCTGCTGGAG CGAGTGATTG GTGATATATG 
GGGATTACGC TTGTGGTCGG CGCTTGGTGT CGCATTCTTT GGCTGGGGGT TTATCACCAG 
CNTACAGGCC AAACGCGGCT GGATGCGCGT GGTGCAAATA ATGCTGCTGG CAGCGGCATT 
GGTTAGCGTG CGCCCACTTC AGGATTGGGC ATTTGGTGCA AC AC AT AC CG CGCAAACTCA 
GACGCATCTC AACTTTACAC AAATCAAAAC AGTAGAT 
(2) I N FORMAT I ON FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 537 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(xx) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
ATCCTGATGA CGCCGTAAAT GTGCATTTGC CAGGATTGCC GCATAGAGGG CACGAAGAAA 60 - 

AGGTCGGTTG TCAGGATGTA TCCAGATGAT TCTGCCACTG AAACCTTCAG GGATAAGACG 
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ATTGCCAACT GCCAGTCCTT TAAGGGCAGC ATTCAGCCCC TTAGGCGGGG CATTCTGCTC 180 

CAGAAATACG TATGCCAAGT GAGCGTGTAC ATCAATAAAG TCATTCTCCT GTCGGGGAAG 24 0 

GCGCCTGAGT TTGTTGATGT AAGTTGTTTC GCTGATTTGA TCCGCATCGT ATGGATCAAT 300 

CAGTTCTTGA AACTGATCCA GCAACGAGCC AAACCAGGTT TCCGGAAATA TGAAACAGCC 3 60 

CTGGTTATCG TTCACTTCAA AGCGTAATTT GCCAGTGATA TTGTGAACCT GTAAAAAAGG 4 20 

ATAGAGCATA ATCTGCAGGC TATAAAAATT GTGGATGGCT GGCATCGGGT GTCCTTTTAT 4 80 

TGTCCGGGAT TAACGTTGCC CATGATAATA CAGTGAATCC NGTTCTGTGG TAAGACG 5 37 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

CGCTCGAGCA CCAGATTCAC TGACATGCGC AAACTCATGT GTAAATCCTG TCTGGGCATC 60 

TATCTCAAGT AACAGTTCCG TTAAATCTAC CGGTGGGAGT AGCTGTTTGA TCCGATTATT 120 

TAGACGAAGC AATGATGGTG GCTCTTCCTG TTTCTCCAGA CAACTGATAG TCAGGGATGG 180 

ATATTTACCT T CAT T AC AG A TATGAACTTC CGCATTCTTT TCAAATCGTG ATGCCAGGCT 240 

TTCCAGGTCT CATCCAGCTG AATAGCCAGT TGTTGCACAC CTTTACGTCC ATCGACAGGA 300 

TGTCCCAGTG CCCGACAGAC AGGAATACGC TGAGTCTGCC ACTCTTCACC TTGCAACAAC 3 60 

TTCTCGCGAG GATCTCCCCA GCGATCACTG TTTTCAAGCC CAGATGTCCC CGGCGGCGCA 4 20 

RTGCATCCTG AAGGCGTTCC AGCAAACATA GTGAATAACC TGCACGCTGT ATCCCGTCCC 4 80 

TCCGCATCGT ATACGAGGCG TTTCCAGGGA CCGGTGATAA TATGTTCAGC GCATCATCAA 54 0 

GGATGCGCTT TTTCGAACCA TTCAGTTCTG CCAGATAATG AATCGCAGCC AGTACATGTC 600 

ACCTGCCGGT GCCGCACGGA AATGCAGGTC CCGCAACACC GCCGGAAGAA AACGTTTAAC 6 60 

CCGACCGTAC TGCTCAACCA TTTCGTCATG GAAATTATTG TTCTGTGGAC GAGCAAGTTC 720 

ATTAACCTTG CTTACAGATT CTGCCAGTCT GTTTTTGGGT ACGCACTTGA AGATAACCTG 780 

CCTGAGATCT GGGACATCTG TATTATCATC CAGCAACAAT GCACATGCCC GCGCCAGTAA 84 0 

CAATGCGGCC TGATCAAGAT CTTTCAGTGT CCTGAGTCTT TTTTTTTGCC CGGTTTTCTT 900 

TGCTTCGCGG ATAATGTCCA GAATTAGCAT ATCAAGCACA TCAACGGCAT CGTCTAATGC 960 

CGTTATTTCC TGTGCTTTAA CGAATGCAGT AAGTACAGCA AGCTTTCTCT GCTGTGGCAT 102 0 

TCGAGCGATA TATTTTACCG ACGCCATGCC AGCATGAACG AGCCAGATTA CGCNTTGGNA 108 0 - 

ATGGTCAGGC AGACCGGGAA AAGTTCCAGT CGGGNAAAAC TCCAAGAA 1128 
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(2) IK FORMATION FOR SEQ ID NO: 26: 

( ; ) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2311 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
GGNTGATAAA AATCYTTTGA TGAATAACGA TAAGCCGCCC AGAGTTATAT TTGTGTTTGA 
GGCTGGAATA TTGATGCTAT AACTTGAGTG CAGACTATAA CCTTTACGCG TTACACCGGA 
ATACCTGAAT GCTGTTCTGG ACAATGTAAT GTCAGATGCT ATAGCACCCA GATGGGTATT 
AAAGGCCAGG CCAGCTAACC CCGCTGTATA TCCTGAAGCT GTGGTAAGAC CACTGTTTAA 
AGTAATATCA TTCGTCAGGC CGTATTGATA GGTGCCTTGT GCTATTAAAT CATTATATGT 
TTTATTCGCA TAACGATACT TTCCCACTGA CATTTGCCAG CGACTAAATC CGGGACGAAT 
GAGTTGAGCA ACGGCCGCAA AAGGAACCGT GAACATTCGT GTCTGGCCAT TAGACTCTGT 
TATCTTAACG AGAAGGTCAC CAGCATATCC ACTGGGATAT AAATCATTGA TGACAAATGG 
TCCGGCTGGC ACCGTCGTTT CATAGAGGAT ATGAGCATTT TGATAAATGG TTACTTTAGC 
ATTACTGTTA GCTATTCCCC GGACAGCAGG RGCATAGCCA CGTAAAGAAC CGGGTAACAT 
TCGTTCATCC GATGCTAACC TGACTCCCCG CAAACTGAGG CTATCCATTA GCTCACCATT 
CGTATAAAAA TCCCCTAATG TGAATTGTGC TCTCAATGGG GCAAGGTCAT GCATTATACT 
TGTTTCTATA TTCTGATATC CGGCAGGATA GCTATTATTC CAGCTCTCAC TGCCACGGTG 
GCGCAAAGCC ATCCCCACAA ATTGAATCCA GCTTTTAATC CCAGATAAGT CTGTTCGTTA 
CTCGTCCCGG AAGAGCTATA CTGGTAATAG TTAGCATCAT AGTTTATAAA TGCTGCAGGA 
ACACCACTTT GCCACTGAGA AGGGGAAATA TATCCTCTTG GACGTGTATT CAGCAGTGCT 
GCGGGATTTC GATATTCAAC CTTAAAGTCG ATAAGTCAAA ATTAATTCTG GCTGAAGAAA 
GCCCTGTTGA CGCCGGAAAG CAGGAGGTGT TTCCCGACAT AGTATCTTTG ACTAAATCAA 
TCAATGAAAG CAGCTCAGGC GTCAGGCATA ACGTCGGAGC ACCGGTATTG GCAGTACGTA 
AATACTGCAA ATCAGCCTTC CCCTTCCATA CATTATTAAC ATAAATATCA GAATAATACC 
TGCCCTCAGG CACAGGGTTA CCATGACTAA AGCGGCGGAT ATCAATAGCA TTTATCCCTT 
TATCCAAATG CAAAAACTCA GAATCAAACT CAGCCTCTTC AGCAGCAAAT GAATGGTTTG 
TTACTGTTAA CCCTAATGCA GCAAAAAGCA GAAGAGAACA ACGACAGTAA ATCAGGCATG 
ACAGATTATT AGCGTTCATT ATTACCTTAC TCCAGAACAG ATTCTCCTTG CTGATATCCT 
CCGTAATCAT TAACAATAAC CCAGGAAACT TTGCTGGTGG CGCAGTTCTG CCTTTAAGTG 
CAAATACTGT TGAAGAGAAA GGGGGAATCA TTCCACCATG TTCAACAGGC GTTAAGTGCT 
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TATTCTGGTC AACTGCAATT TTG77GTAGG TTATGTAATA AGGTGTTGGA TTAACTGCTT 1620 

TAATTCGGCC TTCCTCCTGG TGCCAGGTAA CTTTCAGATA AGCATCATTT GGTGTTAACT 1680 

TCAGGTGAGC AGGACGAAAG AAAAA T T T T A TGCGAGTAGG AACAGCTAGT TGCAAATAAT 174 0 

TATTATTCCG CTGCTCTGAG TTA7CGGAGT CTTTTTTTGC CCTGGGCTTT GCTGGAATAT 18 00 

CCAGAACATT TAGATAGAAA AGAGATTCTC GGTCTTTCGG TAGTGACTCG CCTGTATATA 18 60 

CAATTCTGAC TGTTTGTCCT GATTTAGAGT GCATACGAAA TATTGGCGGA GTAATGATAA 192 0 

AAGGACGTGG ACTGACTCAG GGGGAGCTGC TGCATCTCCA TCGYCAACCA GGACTGGACT 1980 

AATGGCGAGA TTTCATTGTC ATTATTTNAA CGTATGCTAA TACTCTTTTG AGTCGCCGGA 204 0 

TAAACAACAC GGGTTCCCAT GATAAGTACA CTACCCTGAA CAACTGCAGA TACAGATAGA 2100 

GTAAAAAAAA ACAGCACAAA CCTTAGCATG GTATCTCCAG AAGAAAGCAG GGCAGTATTT 2160 

CCTGCCGCAA AATACAAAAC CGTTTGTTAT TCGTAGGCGA TGGTATAATT GACTGTTGTT 2220 

TTTACATTGC CTGGAGTTGA TGTCCCGGTC GCATAATATT GAGCCATATA ACGTAATGTG 2280 

GCATTACCAT CCCCACCAAT AGT7TCAGAA T 2311 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

TATTACCTGT GATTTTTCCG GGCGTAAATG GAGTCCCTAA AGTTATCGCA GTCCCAATAT 60 

TTCCTGCATT ACTGTTATAA AGATAAACGA GTAACCCATC AGAAGATGTG TTTGATGTAT 120 

TCTGAACTAA AATAGCATTG TNATAAGTGT TTGTTGCCGT TATCGTAACC TTCATTGTTC 180 

CCAGATTATA GGGACACCGC ATATTCACAG TAAACTCTTT TTCGTGANTT CCATTTTGAC 24 0 

TCAGGGTCTG AATCTCTACA NCCTGCCAGT CAACAGTTGT GTTGCTTACA GTACAGGCAG 300 

GAATAATCAG TTTTCCTCTG AAGGTCAGAT TATCAACTGC ATGTACATGC TGAGACATTA 3 60 

ACACTGCCCC CAGCATTACC GGAAGACACA AACCTCTTAT CTTTTTCATC TGAAATATCC 4 20 

TGTACAAAAA TTTTGCTAAC GATATGTCAA TTCAAACGTG GCTGTTGCTT CATAATCACC 4 80 

GGGTACCACA CTCTTCGTCC GCAGGGCTTC CGGCGTTGCC AC/VAC ATACG CGCCGAAAGG 54 0 

AAGCTCAAGA CTGTTTCCGG TAACCTTTTC CCCCTGGCCT TTGTTATGGG AGGTGCCGGG 600 

TTTCAGCAGA CTGCTGCCAT CGGTGTCCAG CAGTGCAATG CCTAACCGGC CAGCATTCAC 6 60 

TCCGGTTACC TTCAGATGGC CCGGGAGRCG CYNTCTTCCG TCCCCTTAAA GGTCAGGGTC 720 

ACAATTTTGC CAACTGCTGT TGCATGGCAG TTTTCCAGCC TGATGACAAA CGACTCTGTC 7 80 
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GGCGAACGTC CGGGCGGATA CCAGAAATCC CTGGACGCCC GGGTTTTGAA GACGACATGT 84 0 

TTATTCAGAC TGTCACGGGA CACATGGCAG GGTCTGTCAA GCAGATTACC CCTGAATGCC 
ACATCTGAGG CTATTGCCTG TCCGGCAGAC AGTGCGGCAA ACAGTAAAAG AGCGCCTGTG 
CTTTTTATCA TCACATTCCC TTA2TCATAT TTTATGCTCA GACGCAGCAT GGCCGGATTG 1020 
CTCCTGGCAT CAGAATACTC AAC-TCCTGT GGCGGCCTTT TCCTCCAGGC GGGCAAGCAT 1080 
CTCCTCCTGG CGGCGGGTAA GGCGGGGACA GTAAAAAA 
(2) INFORMATION FOR SEQ ID NO; 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 562 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



1118 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

TTCGTGGGTG AAATCGTAGG CCGCGCTTTT TTGCTGATCG GCCAGTTGAT GAATAGGGTG 60 

GCCAKGATCG GGATAAAACG TACAGGCAGC GATAAACAGA CAGCCCGGAT AGCGGTTGTT 120 

TTTAACGCAC TCCGATAACG CCTGATAACG TGCCAGCAAC TTTTGTTCGG CGGTTTGCGT 180 

TTCGTCCAGC ATCAGCTGAC GACGCCAGAC ATCTATCTGT TGGCTAAGAT AACGCAGCGC 240 

ATCGTAGAGG ATTGCCTCTT TGTCTGGCCA GAAGCGGCGT ACTCGTCCAG TGGATAATCC 300 

ACACGTTCAG CAACCATCTC CAGCGTGGTG TTGGCAATCC CTTGTAATTC TAATAATTTC 3 60 

AGGGCTTCTC CCAGTACATC TTCACGTTGC ACGCTATTTT CCTCCGKCTT TCCCACTGCA 4 20 

ATGTTCGKTC ACGGTTGGCG ATCGCGCAAA TGTGCGCTGG AAGGTTTCAG CATCCATAAA 4 80 

GCCCGTGACG CGTGCTTGTG GATGCTCCTG GCCTTGGTCC GGTCAAAAAA GAGAATTTGT 540 

C C O 

CCGGTAGGGC CAAGGATATT AA 
(2) INFORMATION FOR SEQ I D NO : 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 745 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

CCATCGCTTT ACCCCAGAAA AGTTAAGCCA TATAATGTGA GGGATATAAG TCGTCGTATC 60 

CGGTAAGTAC AGATAACCAC AACATAAGCT CATTCAGTAA ATTTTATCTC TGAACAAACG 120 

ACTATGGCAT GCTCATTTAT ACTATTCATA AGAAAGTGTG ATTATCTGTA AGCATTAACC 180 

ATCAAATCAT ATAACCATAC TAAACTGGCG GATCATCAGC ACCATTAGCA GGTAACTTAT 24 0 



NSDOCID <WO 9622575A3. IA> 



WO 98/22575 



PCT/US97/21347 



-134- 

TGAAATTTTA TTATGTGTTT TTTGTTGATA ATTAATATGC AATATGAATT TGCTATTTTA 300 

GAATGATGAA CAGCATTTAA AAT TAG CATC ATTAACATCA TATAAAAATA TATTTTTACT 3 60 

AAAAC AT G AA TTGTATATAT TTATTAGCTC AGGAAAATTA TCAGGGTTCA CCTTCAAATT 4 20 

AACCTGAATG TTATGCTTAA TTTCACCCAG TAGTTCTTCA TGTGTAGATT TTATTATCCC 4 80 

ATTATTATAA TCGATAAATG CACACATGTT TTTTATGAAT TCAAAACCTT TTCCTGTATA 54 0 

CAGTTTAATG AATGCCACCA GAGCAAACAT TTCAAGATGT AGCCATAATG CTACGTTAGT 600 

TTTTTGCAAA GTATAAAAAA TTGAATTCGC CACTTTTTTA CTTATTGCTC TTTTATACTG 660 

TGATCGAGCA AGATTCAGTA GCGGAAGTCC TCGTTCAATA AATGAATGTG AAAAGACTGG 7 20 

ATAAATTGAT GTCGGAAACC TTTCA 74 5 



(2) I N FORMAT I ON FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GCGTTNATGC ATTTCGASAT TTTCCACTTC GTTCTGACGT TGCACTGCTT TGGCGTCATC 60 

ATTACGTAAC GTATCGAGGA AATCGAGGTA GCCCTGATCA ACATCTTTGG T G AC G TAG AC 120 

GCCGTTGAAC ACCGAGCATT CAAACTGCTG GATATCCGGA TTTTCAGCGC GAACGGCGTC 180 

GATCAGATCG TTCAGATCCT GGAAAATCAA CCCGTCAGCA CCGATGATCT GGCGAATTTC 24 0 

ATCAACTTCG CGACCGTGAG CGATCAGTTC CGTGGCGCTC GGCATATCAA TACCATAAAA 300 

CGTTCGGGAA AGCGAATTTC CGGTGCCGCA GAAGCGAGGT ACACTTTCTT CGCTCCGGCT 360 

TCGCGTGCCA TCTCGATAAT CTGTCAGAAG TGGTGCCACG 4 00 
(2) INFORMATION FOR SEQ ID NO: 31: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 824 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

TGTCGACGAT GAGGCAGCCA GAGCATTAGA GCCGAAAAGA AGGGATGATG CCATGACTGC 60 

TGTTGCTATA AAATGTTTCA TATATTCTCC ATCAGTTCTT CTGGGGATCT GTGGGCAGCA 120 

TATAGCGCTC ATACTAGGGG TTTGAGGGCC AATGGAACGA AAACGTACGT TAAGGAGATA 18 0- 

ATTCGTTGTT TATATTTAAA TTTAGAGCTC TCAGTTCCCC TTTTAAAATA TCCTCTGGCA 24 0 
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ACGTGAATGT ATAATGGCCC AACA7ATTGA TATGCCCGTG CATCAGGGGA GATAGCCGAG 300 

CGATATCTTC ATCTATAATT TCTT'-GCCAT TACGGCGCAT CCAGCTCAAC GCTTCCTCCA 360 

TATAGAGCGT GTTGCACAGA ACCACTGCAT TAGTAACCAG GCCCAGCGCC CCCAGTTGAT 4 20 

CTTCCTGCCC TTCACGATAA CGCTTTCTGA TCTCTCCGCG TTGTCCGTAA CAAATCGCAC 4 80 

GAGCCACAGC GTGCGKTCCT TCTC GTCGAT TAAGCTGCGT CAGGATCCGC CGACGATAAT 54 0 

CTTCATCATC AATATAATTG AGGAGATATA GCGTTTTGTT TACACGCCCT ACTTCCATAA 600 

TTGCCTGTGC CAGTCCTGAT GGGCGCGAGC TTTTCAGTAA AGAGCGAATG AGTTCTGACG 660 

CATGAATTGT ACCCAACTTC AGGAACCAGC GGTTCGCATC ATCTCATCCC ACTGACTCTC 72 0 

CGCTTTTGAC AGATCTGCAT ATCCTCGGGC CAACTTATCC AGTACTCCGT AGTTTGCCGA 79 0 

TTTATTCACC CGCCAGAACA CCGCCTCACG TGCATCGGCA AGCC 82 4 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 911 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

ACAAATCAGA CCAGTTAACC AGTCAGTCGG TTTTATGATT TCACTCACTA TACTTTGTTT 60 

CATAAGGATT TCAGGATCTG CCAGACTGCG CAGAAATGAT GCTTACGAAT ACACAGTAAA 120 

GGCAATGTCA TTTCCGATAC AGAGCCTGAC ATTGCCATAA TGAGCTATTT ATCTGAAAAA 18 0 

CGACAGAATA TGATGTTTTA TCGTAACGTA ATTTTAAGTT CTCAACTTAT TGAGACATAT 24 0 

TGTCTTTTTT ACCCATGTGG TCATTTTTCA TCCCATCCGT TTTGCTCATG TGTTCTTTCT 300 

CCATTTTCTC TTTATCCATT GCATTTTTGC ACATACCATC CTTGCACATT TTATCATGCG 3 60 

CGCTGGACAT GCTGCCTTTT ACTTCATGTG TTTTATCCAT TGTGTCTGCT GCCTGAGCAT 4 20 

TGAACATGAA CAGCGCGGAT AGTACAGTTG CAGAAATAAT ATTTTTCATG GTTCTTCCTC 4 80 

ATTTTTAACA ATTGTATCAA CAACCACCAA ACCAGTTATA ACCCTGGTCT TCCCAGTACC 54 0 

CCCCCGGAAA ATGATTAGTG ACCTCTATAA CCTGAACATG CTTGGGGTTT TTATATCCCA 600 

GCTTAGTAGG GATACGTATC TTTATGGGAT AGCCATATTC TTTTGGCAAT ACCCTGTTAT 6 60 

TCCATGTCAA TGTCAGCAAT GTTTGTGAAT GTAGTGCTGT CGCCATATCA ATACTGGTGT 720 

AGTAACCATC GACGCAACGA AAACTGACGT ATTTTGCCCG CATATCGGCA CCAATCAGCG 7 80 

TCAGGAAATG CCGGAATGGT ATCCCTCCCC ATTTTCCTAT TGCACTCCAT CCTTCAACAC 84 0 

NGATATGACG GGTTATCTGA CTCACATGCT GCATGTTATA CAATTCAGAC CAAAAACCAG 90 0 
TTACGGGTTA T 
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(?) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 463 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
NGGGGCAGGA TAATTGTATC CTGCCCNGTA TATAATTCTC AGCACAGGTG TTGACTAAAG 60 

AGCGTGAAAC TTTGCTATTA TGTCTTCGTA AGATTCACGG ACGGTTATAC TTGAGCCTGA 120 

TTCTGTGAAG TAAACAACAG CAGAAGCATC GTTGCCTTTT TCAATGTATG AAA CAT T C C A 180 

GTCATGGATA GCCACTGCGG GCTGACCATT ATCCCGACGG TGCGTCTTAA TGAATCGCGG 240 

AAGTAATTCT GCAATATCGT TAAAAACACC ATTTACGGTA TGAGTGATAC CACCAACGCA 300 

ATGTAGATGA GTTGACTCCG GGGTATCATT GTCTGCTTCT GCAAAGAGTA TAGCTGTCTT 3 60 

GCTAATTGTA ACAGGCGCCT GTGARCGGGA TAATTCGAGA GAAATAAACC CGGATTCTGC 4 20 

CATAAAAACT CCAGTTTGTG ATGTTATATC ATTTCATATG TTT 4 63 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 565 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

TTCTAACCTC TGACCAAAAA CAGAATTACG GTTGTTATGC TGCAGAACCT AATGACGTGC 60 

AACTGGCGCG CTATTTTCAT CTTGATGAAC GGGATCTGGC CTTCATTAAC CAACGACGGG 12 0 

GCAAACATAA TAGGCTGGGC ATTGCGCTTC AGCTCACCAC AGCCCGTTTT CTGGGAACAT 180 

TTCTGACGGA TTTAACTCAG GTTCTGCCTG GTGTTCAACA TTTTGTCGCG GTACAGCTTA 24 0 

ATATCCACCG TCCAGAAGTT CTCTCCCGCT ATGCTGAACG GGACACTACC CTTAGAGAAC 300 

ATACTGCATT AATTAAGGAA TATTACGGCT ATCATGAATT TGGTGATTTT CCATGGTCTT 3 60 

TCCGCCTGAA GCGTCTGCTA TATACCCGGG CGTGGCTCAG TAATGACGAC CGGGTCTGAT 4 20 

GTTTGATTTT GCCACTGCAT GGTTGCTTCA AAATAAGGTA TTACTGCCCG GAGCAACCAC 4 80 

ACTAGTACGT CTCATCAGTG AAATTCGTGA AAGGGCAAAT CAGCGGCTGT GGAAAAAGCT 54 0 

GGCCGCACTG CCGAACAAAT GGCAG 5 65 
(2) INFORMATION FOR SEQ ID NO: 35: 
(l) SEQUENCE CHARACTERISTICS: 
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( A) LENGTH: 512 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

CGATGGCGTC CGGGGTGAAC GCCGGATAAG TTTAATTTAT CCGGTCAGGC AAAAGGCATT 60 

AATCTGCAGA TAGCTGATGT CAGGGGAAAT ATTGCCCGGG CAGGAAAAGT AATGCCTGCA 120 

ATACCATTGA CGGGTAATGA AGAAGCGCTG GATTACACCC TCAGAATTGT GAGAAACGGA 180 

AAAAAACTTG AAGCCGGAAA TTATTTTGCT GTGCTGGGAT TCCGGGTCGA TTATGAGTGA 24 0 

GTCACTCCGG TGAGATGTCC GGTTATTTAT CTTTTTTGTG AATCTGGTGA TGCGTGGAAT 300 

GAAAGACAGA ATACCTTTTG CAGTCAACAA TATTACCTGT GTGATATTGT TGTCTCTGTT 3 60 

TTGTAACGCA GCCAGTGCCG TTGAGTTTAA TACAGATGTA CTTGACGCAG CGGACAAGAA 4 20 

AAATATTGAC TTCACCCGTT TTTCAGAAGC CGGCTATGTT CTGCCGGGGG CAATATCTTC 4 80 

TGGGATGTGG AATTGTTAAC GGGGCCAAAG TA 512 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 827 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
TTGCCGGTGC GGTTANTAGT GGCAGTGGTG TCTTTTGGTG TAAATGCTGC TCCAACTATT 60 

CCACAGGGGC AGGGTAAAGT AACTTTTAAC GGAACTGTTG TTGATGCTCC ATGCAGCATT 120 

TCTCAGAAAT CAGCTGATCA GTCTATTGAT TTTGGACAGC TTTCAAAAAG CTTCCTTGAG 130 

GCAGGAGGTG TATCCAAACC AATGGACTTA GATATTGAAT TGGTTAATTG TGATATTACT 24 0 

GCCTTTAAAG GTGGTAATGG CGCCAAAAAA GGGACTGTTA AGCTGGCTTT TACTGGCCCG 300 

ATAGTTAATG GACATTCTGA TGAGCTAGAT ACAAATGGTG GTACGGGCAC AGCTATCGTA 360 

NTTCAGGGGG CAGGTAAAAA CGTTGTCTTC GATGGCTCCG AAGTGATGCT AATACCCTGA 4 20 

AAGATGGTGA AAACGTGCTG CATTATACTG CTGTTGTTAA GAAGTCGTCA GCCGTTGGTG 4 80 

CCGCTGTTAC TGAAGGTGCC TTCTCAGCAG TTGCGAATTT CAACCTGACT TATCAGTAAT 54 0 

ACTGATAATC CGGTCGGTAA ACAGCGGAAA TATTCCGCTG TTTATTTCTC AGGGTATTTA 600 

TCATGAGACT GCGATTCTCT GTTCCACTTT TCTTTTTTGG CTGTGTGTTT GTTCATGGTG 660 

TTTTTGCCGG TCCGTTTCCT CCGCCCGGCA TGTCCCTTCC TGAATACTGG GGAGAAGAGC 720 
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ACGTATGGTG GGACGGCAGG GCTGCTTTTC ATGGTGAGGT TGTCAGACGT GCCTGTACTC 



780 



TGGCGATGGA AGACGCCTGG CAGATTATTG ATATGGGGGA ATACCCC 



827 



(2) INFORMATION FOR SEQ ID NO: 37: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

CCAGGGGCCC AAAATCCGTG TATCCACCTT TAAAGAAGGC AAAGTTTTCC TCAATATTGG 60 

GGATAAATTC CTGCTCGACG CCAACCTGGG TAAAGGTGAA GGCGACAAAG AAAAAGTCGG 120 

TATCGACTAC AAAGGCCTGC CTGCTGACGT CGTGCCTGGT GACATCCTGC TGCTGGACGA 180 

TGGTCGCGTC CAGTTAAAAG TACTGGAAGT TCAGGGCATG AAAGTGTTCA CCGAAGTNAC 24 0 

CGTCGGTGGT CCCCTCTCCA ACAATAAAGG TATCAACAAA CTTGGCGGCG GTTTGTCGGC 300 

TGAAGCGCTG ACCGAAAAAG ACAAAGCAGA CATTAAGACT GCGGCGTTGA TTGGCGTAGA 3 60 

TTANCTGGCT GTCTCCTTCC CACHCTGTGG CGAAGATNTG 4 00 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 578 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

CCGATTTTTT GCGAAACGTT CCGCCTGGCA TCAGGATAGT TTGTTCGTTA TCCAGTTCGG 60 

ATAGCGCATT GACGATATGC AGGCTGTTGG TCATCACCGT GATGTNATTA AAGCGCGAGA 120 

GCAGGGGAAC CATCTGCAAA ACGGTACTGC CAGCATCAAG AATGATCGAA TCGCCATCAT 180 

GGATAAAACT AACGGCAGCT TCTGCAATCA GCTCTTTCTT GTGGGTGTTG ATGAGTGTTT 240 

TATGATCGAT AGGCGGATCG GATTCCTCTT TATTCAACAC CACTCCGCCA TAAGTACGAA 300 

TGACGGTTCC GGCATGTTCC AGAATGACCA GATCTTTGCG AATGGKTGTG CCTGTGGTGT 3 60 

CAAATATTGC GCCATTCTTC AACCGAGCAT TTACCCTGCT TTGCAGATAC TCCAGAATGG 4 20 

CGGCCTGACG CTGACGAGTT TCATGGGCGT GATACCTGAT TTAGGTTCAA ATGATAACTC 4 80 

GCAAGCAGTA ACATCACACG NAATATCCAC GTTCAGTTAA GCGCCATGAT AGAGCATCCG 54 0 

TGATAGGGNC AGGGGNAGTC ACACGGCGTA ATCACCGC 578 
(2) INFORMATION FOR SEQ I D NO : 39: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 399 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

TGTTAGGTCA GGGCCCACAG TCAAGCTTAG GTTTTACTGA ATATACCTCA AATGTTAACA 60 

GTGCASATGC AGCAAGCAGA CGACACTTTC TGGTAGTTAT AAAAGTGCRC GTAAAATATA 12 0 

TCACCAATAA TAATGTTTCA TATGTTAATC ATTGGGCAAT TCCTGATGAA GCCCCGGTTG 180 

AAGTACTGGC TGTGGTTGAC AGGMGATTTA ATTTTCCTGA GCCATCAACG CCTCCTGATA 24 0 

TATCAACCAT ACGTAAATTG TTATCTCTAC GATATTTTAA AGAAAGTATC GAAAGCACCT 300 

CCAAATCTAA CTTTCAGAAA TTAAGTCGCG GTAAATATTG GATGTGCTTA AAGGACGGGG 360 

AAGATTTCAT CGACACGTCN GCGTGCAATC TATCCGTAT 399 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 327 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

CAGCCTCCGT TACCGGACAG CAAGGAGGCT GAATGGAGTT TACAGGATTT GCTTTTTTAT 60 

AATGTCTGGC CATGCAGTMA AACCGGACAG GTTTTATTAT CATGTGAGGT ATTCTGACAT 12 0 

AAAATGCTGG ATTTTTATTT TGTGACGAAT GCTGCAAAAT TGCATCTGCA CTCTGATGTA 180 

GCTTTTATCT GTTTCAGTGA AGCATGCCCA CAAACTGAGT TATTAAGTTG TGGAAGAACA 24 0 

GTTTTGTCCC GCCTGCATAT CTCCTTTCAA AAACCAGTAT GTCGCCATGC CTCGCCTTAA 300 

327 

TGGAGAGCGC TGAACCATAC CTTCTTT 
(2) INFORMATION FOR SEQ I D NO : 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 314 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
GGAGATGGGC ATGGAACTCA CTTCATAATA ATGCCTACCG AAGAAATATT AATAGATGAC 60 
ATTTCCACGA GNGATAGCAA TAAAACATCA GAGCAGTCTT CTCGCTTAGA AAAAGCTTTA 120 
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TTAGGTTTTA CAAACACAAT G T AC AG TG AT TCAA.ACCCTC CTATTATAGC TCGTTTTAGA 180 

GACTATCTGG AAGATGGTGA GTGCATTGAC AGAATTAGCG AATCAATTTT TTTTACACCG 240 

CAAGAATTCA ATCTTGCAGA TCACCACATT GAAGGATGGT TCAATGAATT TGGTCAATTC 300 

AGTGGAACTG TTTC 314 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 590 base pairs 

(B) TYPE: nucleic acid 

(C) 5TRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

TCCCAAGATC TTTTTGGCCG CAAATCCACA AAACCCGTCG TTANTGTCGC GCAGCCANTT 60 

GCAGGCCGAA TTTGCACCGT TTTAGAAAGC GGCGTTTTGT AGAGCAGCAC GCAGTGAGAA 120 

GCCACCGCGC CACGACCTAC GNGCNCGCGC AGCTGGTGTA ATTGCGCCAG ACCCAGACGC 180 

TCCGGGTTTT CGATAATCAT CAGACTGGCG TTAGGCACAT CAACGCCGAC TTCAATAACG 240 

GTTGTGGCAA CCAGCAGGTG TAGCTCACCT TGTTTAAACG ACGCCATCAC CGCCTGTTTC 300 

TCGGCAGGTT TCATCCGCCC GTGTACCAGG CCAACGTTCA ACTCTGGTAG CGCCAGTTTC 3 60 

AACTCTTCCC AGGTAGTTCC GMCGCCTGCG CTTCCAGCAA TTCCGACTCT TCAATCAACG 420 

TACAAACCCA GTATGCCTGA CGACCTTCAG TTATGCAGGC GTGGTGCACC GGGTGCAATG 4 80 

GATGTCGGTA NNGCGGGTAT CAGGAATAGC GACCGTAGTC ACTGGGCGTG CGGCCTGGGC 54 0 

GGCACTCCAT CTATCACCGA GGGTATCGAG ATCGGGCATA CGCNTGCATT 5 90 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

( B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

GACGAAAGGG CCTCGTGATA CGCCTATTTT TATAGGTTAA TGTCATGATA ATAATGGTTT 60 

CTTAGACGTC AGGTGGCACT TTTCGGGGAA ATGTGCGCGG AACCCCTATT TGTTTATTTT 120 

TCTAAATACA TTCAAATATG TATCCGCTCA TGAGACAATA ACCCTGGATA AATGCTTCAA 180 

TAATATTGAA AAAGGAAGAG TATGAGTATT CAACATTTCC GTGTCGCCCT TATTCCCTTT 240 

TTTGCGGCAT TTTGCCTTGC CTGTTTTTGC TCACCCAGAA ACGCTGGTGA AAGTAAAAGA 300 

TGCTGAAGAT CAGTTGGGTG CACGAGTGGG TTACATCGAA CTGGGATCTG CAACAGCGGT 3 60 
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AAGATCCTTG AGAGTTTTTC GCCCCGAAGG AACGTTTTTC 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 case pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
ATTCGGAAAG ATGCTTCTAN TTTTTTTAAG CACGTATAAA CTGTTAATTC AGGTTCAATG 
CTACGAAATG CACTAGTTAT AACCTGTATT GAAGGAAAGA TCTTCTGATA CTCTTTCCAG 
AGATCTTCAA GTCTGGCCAT GGAAATTGAC TTGGCTGCAT ATTCTAGGTC AGTGTTTATG 
ATAGTTTCTC TATTCTCTCT GAATGCGGAA AAAAAAGCTT CATTCAACAA TGATAGTAAA 
TCCCTGGGCC GGTAAAGGGT AAATTGCAAA CATCGCTTAA AACCATTCCT CCCTTTAAGA 
TCATCCGCTG TGCATCTATC CCAAACTCGT TGATCTTTCT CAATATCTAG CTTAAATGCT 
ACTTTCATTC TTTTAGCTGA CAGCATTAGG AGTTGTGCCC 
(2) INFORMATION FOR SEQ ID NO: 45: 

(x) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 585 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

TAATGTTGAA GACAGAGATA TAATNTACAG CATCATCCCA CAAGGCAGAT ATAACAATAC 
TTGACTGGGA TATGCAAAGC GATAGTGGGC AATTTGCTAT TGAAATAATA AAATCGATAA 
TCGTTTCAGA TATAAATTCT GGAGGACGTT TACGTCTTCT TTCTATTTAT ACTGGTGNAC 
ATGTTACTGC TGTTATAACT AAGTTGAACA ATGAGTTAAA GAAAACATAC CGTAGCGTAA 
TAAAAAATGA TGATAGTATT TTTATTGAAG ATAACTATGC ACTCGAACAA TGGTGTATAG 
TTGTTATTAG TAAAGACGTT TATGAAAAAG ATCTTCCAAA TGTGTTAATA AAAAAATTCA 
CTAACCTTAC AGCTGGGTTG CTATCCAACG CCGCACTCTC TTGCATTTCT GAAATAAGAG 
AWAAAACCCA TGGGATATTA AC AAAA TATA ATAATAAATT AGACACTGCA TATGTTTCCC 
ACATCTTAAA TTTAATAAAA TCCAAGGRGT CAAGGGCATA TGCTTATGAA AATGCTCATG 
ATTATGCAGT AGATTTAATT TCTGAAGAAA TAAGATCAAT ATTGC 
(2) INFORMATION FOR SEQ ID NO: 46: 
(i) SEQUENCE CHARACTERISTICS: 



400 



60 
120 
180 
240 
300 
360 
400 



60 
120 
180 
240 
300 
360 
420 
480 
540 
585 
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(A) LENGTH: 390 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

ANTCATCCAA CTGGCCGATC AGCAAAAAAG CGCGGCCTAC GATTTCACCC ACGAACTGTT 60 

AACCACGCTG GAAGTTGACG ATCCGGCGAT GGTAGCAAAG CAGATGGAAC TGGTGCTGGA 120 

AGGCTGTTTA AGCCGAATGC TGGTGAATCG TAGCCAGGCG GATGTCGACA CCGCACATCG 180 

GCTGGCGGAA GATANTCNTT GCGTTCGCCC GCTGCCGTCA GGGTGGTGCA CTGACCTGAC 24 0 

AGAAACACAG AAAAGAAGCG ATTTGCCGCA ATCTTAAGCA GTTGAATCGC TTTTACTGAA 300 

ATTAGGTTGA CGAGATGTGC AGATTACGGT TTAATGCGCC CCGTTGCCCG GATAGCTCAG 360 

TCGTAGAGCA GGGGATTGAA AATCCGTTGT 390 



(2) INFORMATION FOR SEQ I D NO : 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 473 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

GGATGCCAGT GTCAGCGACT GGTTAAAGTG GTCGATATCG ATGAGCAAAT TTACGCGCGC 60 

CTGCGCAATA ACAGTCGGGA AAAATTAGTC GGTGTAAGAA AGACGCCGCG TATTCCTGCC 120 

GTTCCGCTCA CGGAACTTAA CCGCGAGCAG AAGTGGCAGA TGATGTTGTC AAAGAGTATG 180 

CGTCGTTAAT TTTATCTCGT TGATACCGGG CGTCCTGCTT GCCAGATGCG ATGTTGTAGC 240 

ATCTTATCCA GCAACCAGGT CGCATCCGGC AAGATCACCG TTTAGGCGTC ACATCCGTCG 300 

TCCCCTGGCA AACGGGGGCG ATTTTCCTCC ATTTGCCTCA GTGGCTGGCG TTTCATGTAA 360 

CGATACATGA CAGCGCCCGA CAAGATCCTG ATACTCTTTG GGTATTCAAC CGTTTCCAGT 4 20 

GTAATTCGTC GTTCACNAAC ATTGGCGTTA CAGGCGGGGC TGGCNGTNAC CCA 473 



{2} INFORMATION FOR SEQ ID NO: 48: 

(r) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 482 base pairs 

( B ) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 48: 



INSDOCSD: <WO . 9822575A3 IA> 



60 



WO 98/22575 PCT7US97/21347 

-143- 

GAAGTGACGG ATGGCTGTGG TTTCTCCATC GGTCACCAGC AGCAGTTNGC ATCATGGATT 
G C C T A T AAA ' j TCGCGCCGTT CCTCGGNAAA AAAGAGGAGA GCGTTGAAGA CCTCAAATTG 120 
CCGGGCTGGC tgaacatttt CCACGACAAC ATCGTCTCCA CGCGATTGTG ATGACCATCT 180 
TCTTTGGTGC CATTCTGCTC TCTTCGGTAT CGACACCGTG CAGCGATGGC AGGCAAAGTG 24 0 

CACTGGACGG TGTACATCCT GCAAACTGGT TCTCCTTTGC GGTGGCGATC TTCATCATCA 300 
CGCAGGGTGT GCGCATGTTT GTGGCGGAAC TCTCTGAAGC ATTTAACGGC ATTTCCCAGC 3 60 

GCCTGATCCC AGGTGCGGTT CTGGCGATTG ACTGTGCAGC TATCTATAGT TCGCGCCGAA 4 20 

CGCCGTGGTC TGGGGCTTTA TGTGGGGCAC CATCGGTCAG CTGATTGCGG TTGGCATCCT 4 80 

AG 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 185 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



4 81 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
GACGACCTGC AGGCATGCAA GCTTGGCACT GGCCGTCGTT TTACAACGTC GTGACTGGGA 60 
AAACCCTGGC GTTACCCAAC TTAATCGSCT TGCAGCACAT CCCCCTTTCG CCAGCTGGCG 120 
TAATAGCGAA GAGGCCCGCA CCGATCGCCC TTCCCAACAG TTGCGCANCT GAATGGCGAA 180 
TGGCG 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 491 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



185 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 50: 



TAACGCTTCA 


ATACGCGCGA 


CCAGCTGGCG 


GCGCTCATAC 


GGCGTAATTT 


TGGCGTCGGC 


60 


GAGCAAAATC 


CCTTGTTTAA 


AGGTATTTTG 


CCAGCTGCCG 


TCGTCATATT 


GGCGAGCTTG 


120 


CTGACGCGAC 


TGCGCAGGCA 


TTAAACGATC 


AGCACAATCC 


ATCGCCCGCA 


GCCAGTAAAG 


180 


CGGATTGGTT 


TCGGTTGATT 


TACCTTGCAG 


CGCCCAGATG 


TCGCTACATT 


CAGTAGAAAG 


240 


ATAGTCAGCC 


AG TT GAT AAA 


CCGGAATTTT 


TTCTTCTGCT 


GGCGTATCAA 


TGGCTGGCTT 


300 


ATTGTGATTC 


TGCACGCAAC 


CCAGCAATGC 


CAGACATGGA 


GACCCTGCCA 


GCCACAGCCG 


360 


TCGGGGCAAT 


AATCGTTGAA 


AAATGTGTCG 


CATATTCACC 


AGACTTAAAG 


CCTATCCCAG 


420 
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TGGGCGTAAT TGTTGCAGAC AGTCTGGACA TGGACAGCGC GGAGAAACCG GNAGCGTACA 4 8C 

TATCGTACGT G 4 91 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 106 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
ACTTGAACGG CAATTATTAT TTATCCATGC AACTTCAAGT TGCAGTATCG GAACATTAAC 60 
TTTTCTGGGG TGAATATCAC TCTGATATCG TTTTTTGTAT GCGTNT 106 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 481 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

TTTATGTGCG GTATTGATGG CTGAAGCCTG TAATATCGGA CTGGAACCGC TGATAAAGCA 60 

CAATAT AC C A GCACTGACCC GCCATCGGCT CAGTTGGGTG AAACAGAATT ACCTTCGTGC 120 

AGAAACGCTG GTCAGCGCCA ATGCCCGCCT GGTTGATTTT CAGTCCACAC TGGAGCTTGC 180 

TGGTCGTTGG GGAGGTGGAG AAGTGGCATC AGCTGACGGC ATGCGCTTTG TCACACCAGT 24 0 

GAAGACCATC AACTCAGGAT CTAACAGAAA ATATTTTGGT TCTGGGACGA GGCATCACCT 300 

GGTATAACTT CGTATCTGGA TCAGTACTCT GGGTTCCATG GCATTGTGGT ACCCGGTACA 3 60 

TTACGGGRCT CGATTTTGTA CTGGAAGGAC TTCTTGAGCA GCAGACAGGG CTGAATCCAG 420 

TTGAAATCAT GACAGACANT GCGGGTAGCA GCGATATTAT TTTCGGTCTG TTCTGGCTAC 4 80 

T 481 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 558 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
TGGNCCGTAA TTCCCAACCA TTTGCCGAGG TCCAGNTTTT TCACCATGTT ACTCGGGATA 60 
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GCCAAAACNG ATACCGATGT TGCCGCCGTC CCGGTGCGAG GATCGCGGTG TTGATACCGA 120 

TCAGTTCGCC GTTCAGGTTA ACCAGCGCAC CACCGGAGTT ACCACGGTTG ATCGCTGCAT 180 

CGGTCTGGAT GAAGTTTTCG TAGTTTTCGG CATTCAGGCC GTACGCCCCA GCGCAGAGAC 24 0 

AATCCCGGAA GTTACCGTCT CGCCCAGACC AAACGGGTTA CCAATCGCTA CGGTGTAATC 300 

ACCCACGCGC AGTGCATCAG AATCGGCCAT CTTAATTGCG GTCAGGTTTT TCGGGTTCTG 360 

GATTTGGATC AGCGCGATAT CAGAGCGCGG ATCTTTGCCA ACCATCTTCG CGTCGAACTT 4 20 

ACGGCCATCG CTCAGTTGAA CTTTAATGAC CGTCGNGTTA TNAACAACGT GGTTGTTGGT 4 80 

GACGACATAG CCTTTATCGG CATCAATGAT GACGCCGGAA CCCAGCGCCA TGAATTCTGT 54 0 

TGCTGGCCGC CACCATTA 558 
(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 263 base pairs 

(B) TYPE: nucleic acid 

{ C ) STRANDEDNESS : double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
CACCTGCGTG ACGTGACCGA CCTTTTCTCC TCGCTGNTTG TTTCCCCTAT CGTCGGCCTG 
GTCATTGCGG GAGGCCTGAT ATTCCTGCTG CGACGCTACT GGCGCGGGAC GAAAAAAGCG 
TGACCGTATT CGCCGCATTC CGGAAGATCG CAAAAAGAAA AAACGGCAAA CGTCAACCGN 
CATTCTGGAC GCGTATTGCG CTGATTGTTT CCGCTGCGGG CGTGGCGTTT TCGCACGGCG 
CGAACGACGG ACCAAAAGGG ATC 
(2) I N FORMAT ION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 683 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GTAACGCGTC TGGAAGATGG CCTGCCAGTG GGCGTCGTCG ATGTGGTCGA GGGGCTGGAC 
GGTTGCCATT CCGCCAATAT CTCACCGGAC AACCGTACGC TGTGGGTTCC GGCATTAAAG 
CAGGATCGCA TTTGCCTGTT TACGGTCAGC GATGATGGTC ATCTCGTGGC GCAGGACCCT 
GCGGAAGTGA CCACCGTTGA AGGGGCCGGC CCGCGTCATA TGGTATTCCA TCCAAACGAA 
CAATATGCGT ATTGCGTCAA TGAGTTAAAC AGCTCAGTGG ATGTCTGGGA ACTGAAAGAT 
CCGCACGGTA ATAATCGAAT GTGTCCAGAC GCTGGATATG ATGCCGGAAA ATTCTCCGAC 
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ACCCGTTGGG CGGCKGATAT TCATATCACC CCGGATGGTC GCCATTTATA CGC TTGCGAC 



4 2 0 



CGTACCGCCA GCCTGATTAC CGTTT 



TCAGC GTTTCGGAAG ATGGCAGCGT GTTGAGTAAA 



480 



GAAGGCTTCC AGCCAACGGA AACCCAGCCG CGCGGCNTCA ATGTTGATCA CAGCGGCAAG 



540 



TATCTGATTG CCGCGGGGCA AAAATCTCAC CACATCTCGG 



TATACGAAAT TGTTGGCGAN 



600 



CAGGGGCTAC TGCATGAAAA AGGCCGCTAT GCGGTCGGGC AGGGACCAAT GTGGGTGGTG 



660 



GTTAACGCAC ACTAACCGCT GAT 



683 



(2) INFORMATION FOR SEQ ID KG: 56: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 282 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

TGGATGCAGG GAAAAACATT GATATTACCG GGGCAACGTG CTCGTCCGGT GGAGACCTTG 60 

GAATGTCTGC GGGTAATRAC ATCAACATTG CCGTAAACCT GATAAGCGGG ACAAAAGTCA 120 

GTCCGGTTTC TGGCACACTG ATGACAACAG TTCATCATCC ACCACCTCAC AGGGCAGCAG 180 

CATCAGCGCC GGCGATAACC TGGGCGATGG CTGCAGGCAG AGATKCTGGG NTGTCACAGC 24 0 

ATCCTCTGTT TCTGCCGGGC ACAGCGCCCT GCTTTCTGCA GT 282 
(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 697 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 

ATGAACGGCC CCCCCCACAG CCCGTTAACA AACGGNTGCC CCGGCGATAA TCGTACTGAT 60 

AAGTTAACTC CAGCAGGCGG TTAATTGAAA GCGAACGGGA GGCTGATGCA TGGTAATAAT 120 

CCCTTAAAAC GCGACGGCAA CGCGCCAGTA AACCGTGAGA TGGTCAGGGG CAAGCCAGTC 180 

CGGGTAAACC AGAGGCAGTC CGGCAGTGAA CGAACCGGAA ACATGACCAC TGGTGGTGCT 24 0 

GAGCCCGGCA GCAGCACCCC ACAGCGTGCC GGACGAGTAC GGGTCATCTC TGTCAGAGTG 300 

CAGCCAGCCG CCGTCCAGTG CAGTCACTGC ACGGACTGTC CCCACATATG GCAGGGAGAA 3 60 

CAGAGACCAG GACAGCTCAT TTCGCAGATA ACCGCCGTTA TTACCGGAGA TATACTGCTC 4 20 

CTTAAAGCCA CGCACTGAAC TCTCACCCCC GAGGCTCAGT TGTTCCACAC CATGAAGACG 4 80 

GTCCG3TGAC CACTGGGCAT AAGCGCTGGT CAGCCACCAC ACCCTGTCCG TGACGGGGCG 54 0 
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CTGAAAACTG GCACTCACCG ACCATTTCCG GAACTGATTT ACGGGCAGGT CTCCCCTTTT 
CCCGTGGTCG CTTTCTGCGC CGAACCAGGG CATCCCCCGT GTGAATACCG GATTCAGTGT 
TCCGACACCA CCCAGAAACT TGTGTGTGTG ATTCANC 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4835 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 58: 



TTCGACTGAG 


CACCACAAAT 


ACTGGGTATC 


TCCCCAGATA 


GTTCATTGCG 


GTACAAGCAA 


60 


TATAGGTGCA 


GAAAGTCAAC 


CTGCTGCACC 


CTATTGGATA 


ATTATATATG 


GCCTTCAATA 


120 


AAGTTTGCGG 


TTGTCGACGT 


TGGCTATATC 


AGCCATTTCC 


AATGCATAGT 


TCTTTGGTTT 


180 


AGCACCATCA 


AGTTATAGAT 


TTGGGAATAG 


TTTCAACTGG 


TATTGATTGA 


ATTGGGTTTC 


240 


ATCGTCGATG 


ATTAATACTA 


TTTGTAAAGA 


CTTTATTGTT 


GATTTCTTAT 


TATACCACAA 


300 


ACCCAAACTG 


GTCTAGGTCA 


TCATTTGGTG 


TTGATAACGG 


GCTCTGATAA 


TTTCTGCTCT 


360 


TCTGCTATAC 


TGGGGATTAT 


G AAGAAT AT T 


AAGGCTGAGT 


GTATTGAGGT 


AGTGTTCTTT 


420 


GAACCGACCA 


TTCATGACAA 


TATATTCTTC 


AA.TTCGTGAG 


TGATCCAGCA 


ACTGGTTGAA 


480 


TTTAAAACAC 


TGAGTGATGT 


TATCCTCTGT 


AATCGTATGG 


TTGCTGAACT 


AGTTGATGTA 


540 


GCCGATAAGG 


TTTATACCAG 


ATATCTTTTG 


GGGGGATTAG 


ATAACGTAGC 


CGCGGATAGC 


600 


AAACGAGATA 


GTTGAATTTT 


ATTACCGTAA 


TTTCTTCCAT 


TGAGAAAAGC 


TTATTTTTCT 


660 


TGGTGGTATT 


CGCAGTTATG 


TATCTTCCAT 


AAAGACTTGG 


GAATATCTTG 


CTTGAAARGC 


720 


TATCTGGAGA 


TAGCCTTAGT 


TATTTGATAA 


ATATTTCAAA 


TAGGAGGAGC 


CGTATGGCTG 


780 


TCATTTATAC 


CCTCACTAAA 


TCGTCACTTG 


TCAAGTCTGG 


TGGTCAATTA 


CATTGGAATA 


840 


TTGATTCGCC 


AT C AG AAC AA 


CAGCCACAAA 


AGATCGTCAA 


TGGTCGGGTT 


GCGCTTCGGG 


900 


GATGGTTACT 


GGCAGATGTG 


GAAAAAGATC 


TCCGTGTTGC 


GGTTAAAATT 


GAACATTTGA 


960 


CATACAGTTT 


TCCCTTCAAT 


ATAAAGCGCC 


CTGATGTTAT 


TTCAGCTATA 


CTGAAACAGC 


1020 


CACCTGAAAA 


ACATCAAAGA 


CTTCATTGTG 


GATTTGATAT 


CAATGTCCCA 


TTTTCTACTA 


108 0 


AAATAATTAT 


TGGCCTTGAG 


TCTGATGGGT 


TGATTACCTG 


GTTGGAAGAG 


TTATTATTTC 


1140 


TCCTGCCTGA 


TAATTGAATT 


AAGTATCTAT 


ACCGATAGTA 


TCGCGATAGA 


TATATTTTTT 


1200 


TACAGGATGA 


TAATTTGAGA 


ATCTATATAG 


CCGCTATTAT 


CAAGGATGAG 


TATTCAAGTT 


1260 


TACTTGAATG 


GATTGCCTAC 


CATCGAGTAT 


TAGGTGTTGA 


TGGGTTTAKT 


ATTGCAGATA 


1320 


ATGGCAGTCG 


TGAWGGTAGC 


CGAGAATTAC 


TATTTTCCCT 


CGCTCGCCTA 


GGTATTGTGA 


1380 
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CGATGTTCGA ACAACCGAC7 TTGGTGAATC AAAAGCCACA AT TACCTGC A TATGAACATA 1440 

TTTTACGTAG CTGTCCCAGA GACATAGACC TGCTTGCATT TATAGATGCT GATGAATTTT 1500 

TATTGCCACT TGAATCGGAT ACCAATTTGT CAGATTTTTT TTCTGAAAAG TTTCAGGATG 1560 

AGAGTGTCAG CGCTATTGCA TTGAATTGGG CAAATTTTGG TTCTAGTGGT GAATGGTTTG 1620 

CTGAAGAGGG GTTGGTTATT GAACGTTTTA CCTATCGTGC CGCGCAATCC TTTAACGTTC 168 0 

ATCATAACTT CAAAAGCGTG GTCAAACCCG AAGGAGTTAA CCGCTTTCAT AATGGGCATT 174 0 

ATGCTGATTT GCGTTATGGT CGATATATCG ATGGATTGGG TCGTGATTTG ATTGTGCACC 18 00 

CGAGGCATGG TAATGGGGTT AGTGCTGAAG TGACTTGGAG CGGTGTCAGG GTAAATCACT 18 60 

ATGCAGTTAA ATCACTTGAG GAATTCTTGT TGGGCAAGCA TCTGCGTGGT AGTGGTGCCA 192 0 

CTGCTAATGG AGTAAAGCAT AAAGATTATT TCAAGGCACA TGATCGTAAT GATGAAGAGT 198 0 

GCCTTCTGGC TGCCGGATTC TCAGAACAAG TAAAAGCTGA AATGGAACGA TTAAGTGTGA 204 0 

AGTTGACTGA GTTACCAGCA GTTGAACCTA TTCCTAGTGG TTCTTGGTTC AAAAAAAAAA 2100 

TGAAGAAATG GATGGTTTGA ATATATTGAG CAAGCACTTT GGTATTTATT TCTGGTCTTA 2160 

TCTACAGGTC TGCTAATAAG GATCTGTATC CCCCAGGTGT TAGCTTGGAC TGTAAGTTAT 2220 

ATTATGTGTA GCTATTGCGA TTGGCAGCGT GTGACATTGC CAGACTCGTT TTCTCTTCAT 22 8 0 

TCTGGTTGGC TTCTGATTCG GGGGCGCGTG TTGACGACTC AAACTCGAGG TGAAACTCGT 2340 

CTGCGCTGGC AATGCGGACA AGGAATATGG CATGAACAGA AGTTGCCGGT CACTGGTCGA 2 4 00 

GGCACGTTGC TGGAGCTGGT TTATCTACCY TCGGGAGCTA GTCATTKGTC TTTGCTGGCA 24 60 

AGTAATAAGG GCGCTGAGTG TAATGTTGAA ATTACTCAGC TTTGTTGTGT ATCCCGTGCC 2520 

GAGAGTCTCT GGCGTCGATT GCGCCGGGTT GTACCTTTTT ACCGACGCTT AACGAAGTCC 2580 

AGACGCAAAA GGTTAGGCCT TTCATGGCAT TTGTGGCTCA CGGACTTGCA GCAAGCTTAC 2 64 0 

CAACTTGTCA GCAGAGTTCG CGATGATAAA CCACTCAATA GCTATGATGA GTGGCTAGCA 2700 

GACTTCGACA CCCTTGAACC CGCCGAATAC AAGCTGATTA AGCGCCAGCT GGCTGGCTGG 27 60 

GGCACATTAC CACGTTTCTG TTTGCATCTT GTTGGCGTTG GGGATGAACA GAGCCGCCAC 282 0 

AAGACCCTGG AGAGTATTCA GGCACTCTGT TATCCGGCAA GCAATATAAA GCTGCAGGAG 2880 

CATGGTGCAT ATCCAGAAAT CTCCAGTCAG TCAAGCGGCG AATGGCAGTG GGTGTTGCCT 2 94 0 

GTAGGGGCAG TGGTTTCGCC AAGCGCCTTA TTTTGGGTTG CCCACCAGTT AGGCCAGAAT 3000 

CCTGATTGTT TATGGATATA CGGTGATCAC GATCTGCTTG ACGAGAGAGG TGAACGTCAC 3060 

TCTCCCAACT TCAAACCTGA TTGGAATGAA ACGCTGCTAC AGAGCCAAAA CTATATTAGT 3120 

TGGTGTGGTT TGTGGCGTGA ACAAGGTGCT GGCCGTGTTG CCTTTGATGC GGCGACATGC 3180 

CATCAGTGGT GGCTACAGTT GGCAAAGATG TGTGAACCGA AACAGATAGT CCATATTCCA 32 4 0 

TCATTGATGA TGCATTTGCC TGCAAGAGCG TTGATTTCGG ATGATTTTGA GTCGCTGAAA 3300 
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GATAAAGAAG ATTTACTGCC ATCAGGAGTG AGCATTGAGG CAGCACCTCA TGGTGTATGT 3360 

CGTTGGCGCT GGGCGTTGCG AGCGCAATTG CCATTGGTTT CAGTGATTAT CCCTACTAGA 34 20 

AATGGTATTG CTCATTTACG CCC7TGTATC GAAAGCCTGA TAGAAAAGAC GCAATATGCC 34 8 0 

AATATGGAAG TCATAGTGAT GGATAATCAG AGCGATGAGG AGGAGACGCT TGCTTATCTT 3 54 0 

GCTCATATCG AACAGGTTTA TGGCGTTAGG GTGATTTCTT ATGATCAACC GTTTAACTAT 3 60 0 

TCAGCCATCA ACAATCTGGC AGTGAGAAAC GCACATGGAG ATATGATATG TTTGCTGAAT 3 660 

AATGATACTC AGGTAATCAG TATTGACTGG CTGGATGAAA TGGTTTCTCA TTTATTACGC 37 2 0 

CCCGGCGTGG GTGTGGTAGG AGCAAAGCTG TATTACGGAA ATGGCTTGAT TCAGCATGCA 3780 

GGCGATGCTG TCGGCCCTGG CGGTTGTGCA GATCATTTTC ATAATGGTTT GTCAGCTAAC 384 0 

GATCCTGGAT ATCAGCGTAG GGCTGTTAGT GCCCAAGAGC TGTCAGCTGT GACTGCAGCT 3900 

TGTTTATTGA CTCATAAAGA GTTATATCTG GCGCTCGGAG GACTTGATGA AACGAATTTG 3960 

CCGATAGCTT TTAATGACGT RGATTATTGT CTCAGAGTTC GA3ATGCTGG CTGGAGAGTA 4 020 

ATCTGGACTC CCTTCGCTGA ATTGTATCAT CATGAGTCTA TTTCCCGTGG TAAAGATGTA 4 08 0 

TCAAAACAAC AGCAGATACG AGCGAAATCT GAGTTGCGCT ATATGAAAAA ACGATGGGCA 414 0 

TGTGCACTTA AACACGATCC AGCCTAGAAC CAAAATTTGA GTTATGAACG TCCTGATTTC 4 200 

TCTTTAAGTA GAGCTCCTAA TATAGTATTG CCATGGATGA ATTAATTCGC AGGAAACTAT 4 2 60 

TTAAGCCTTA TCGTAAATTA AATAAACAGA GTTATAGAAG TCCGCAAAGC TCTGAGATTA 4 320 

ACTTTGAACG ATTGTTTATA TTACATGAGG GAAAATCACC TACATTAGCC TATTTTGAAT 4 380 

CGGCTATTAT AAGTCGGTTT CCTGATGCAG AATGTCATTT TATCGACACA TTAGCATCCA 4 4 40 

CTGATATATT TATTCCTAGA GGATCTGCCC TTGTCGTCAT TAGATTCATC TCCCCAAAAT 4 500 

GGCAACAGCA CATAGAAAGA TATAACGACA GGTTTTCTCG AATTGTTTAT TTTATGGATG 4 560 

ACGACCTGTT TGACCCGACT GCACTATCTA CGTTACCAAA AGAGTATCGT ACCAAGATAA 4 620 

TAAGGAGGTC GGCGGCTCAG CATCGATGGA TTACGCAATA TTGTGATAAC ATTTGGGTTT 4 680 

CAACTGCCTA TTTGGCTAAT AAATATGCAC ATCTTAACCC GGAGATTGTT TCTGCTAAAC 4 740 

CGTCACTGGC ACTCATTGAA ACACATCGAT CAGTAAAAAT CGCTTATCAT GGCTCAAGTT 4 800 
CTCATCGGGA AGAAAAATAT TGGTTGAGAC AAATC 
(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1746 base pairs 

(B) TYPE: nuclexc acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 



4835 
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GAAAAATGKC ATAACCGCAT TCCATCAAGC CCGTNAATAT CCCGGACTTT CATTTATTTC CO 

TGAGGCGTAG AG GG AAGCAA TAACTGCTGG TCAGATATTG C7GTCTCCGG TACATTTACC 12 0 

TGACACTGTA TTTTTCCATC CCAGTTTACC GACAGGGTTT CCCCCGGCGT CACGCCACTC 160 

AGCCAGGCAA GGCCTTCGTC GGCCACCATG CCCAGTTCCC GGCCTTTTTC ACTGGTTACA 24 0 

CTGGCACCAA AGGGGGGCTG AGAGCCATCA GCAAGACGCA GTATTGCAAA CAGACGTTTC 300 

CCTTTAAGCA CGCTGAATTT CCGGTAACCA ATGGCACCTT CTGTCAGCGC CGATTCCACA 3 60 

ACAGAACGGG TTGCTTCCAC ATCATCCGGT AAGCGCTTCA GGTCAACAGA GGTTGTATTC 4 20 

CGGTAATAAC TGCTGATGTC AGTCACCACG CCCGTTCCCC AGCGATTTGT CACCACCTGC 4 80 

CCGCCATCAA CCGGTACACC TCCCACACCA TCCGTGTCAA CAAGAAGACG TGTTCCACCG 54 0 

GACATTCCCC CTGCATGTAA CGCCGCACCT TTTCCGGTAA TTGTTGCCCC ACCGGAAGCA 600 

CTGACGCCGA AAGACGTATA TCCTTTCTGC AGGGATGCAA TATTCGCGGA CAAATTTGCC 660 

AGCGGACTAC GATGACTGTA ATAGGCATTA ATCTGACGTT GCGATGTCAG TCCACCGCCA 720 

CTGTTAAGGC CGGCGTTCAG GCTGTAGCTG TCCAGACCGT CATTGAACGT GWCAGTGTAG 7 80 

CCGGCCATAT TCACATAACG GTCATTACTC ATACTGCCAC TGTAGCTCGC TGTCCCCGTC 84 0 

CCCCAGCGGC ACGGATATAC GCAGGTAAGC AGAATCNTTA TCACGCCCCA GATATTTAGA 900 

CCTTGAGGCT GACAATCCAA CCGCCACACC CTGCAGTCCG AAAACATTAA AGTAGCGGTT 960 

GACGCTCACC GTATAATAGT CCGTTTTCCG TATGTCCCAG TATGTCTGAC GGCTGTACTG 1020 

CAGGTTAAAA GAGGTGTTCC AGTCCGCCAC GTTTTTATTC AGCGTAACGG TATAGATCTC 1080 

TTTTTCCCGA CTGCTGTAAT CATTACGGTA GCGGGCGTTC AGGTACTGCT CCATGGTCAT 114 0 

ATAGTTTCGC TCTGAGAAAC GATACCCGGC GAACGTAATG TCGGCATCCG CATTATCAAA 1200 

CCGTTTGGAG TAGCTCAGAC GCCAGGATTT TCCCTGAAAC GTTCTCTCTC CCTCAATACG 12 60 

GGCTAGTGAC TGCGTGATAT CAGCGGAAAG GGTCCCCGGC ACACCCAGGT CCCAGCCGGC 1320 

ACCGGCTGCC AGTGCATTAT AATCACCGGC AAGCACAGCC CCGCCATACA GCGACCACTG 1380 

GTTACTGAGC CCCCAGGATG CCTCTCCGGT CGCAAATACA GGCCCTTCGG TCTCATGCCC 14 40 

GTATCCACGG GAACGACCGG AGACAAGTTT GTACCGGACC TGTCCCGGAC GCGTCAGATA 1500 

AGGAACCGAG GCCGTATCGA CCTGAAAGTT TTCTTCCGTC CGTTCTGTTC AATAACCTCA 15 60 

ACATCAAGAC GTCCGCGAAC TGAACTGTCC AGGTCCTGAA TACTGAATGG CCCTGCGGGG 1620 

ACCATCGAGT CGTACAGCAC CCGTCCCTGC TGCGACACCA CAACACGGGC ATTAGTCTCC 168 0 

GCAATCCCGG TAATCTGCGG TGCATAAGCC TTCGCATTCT TGGGGCGGCA CATTCCGGGT 1740 

CAGCGN 17 4 6 
(2) INFORMATION FOR SEQ ID NO: 60: 
(i) SEQUENCE CHARACTERISTICS: 
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( A ) LENGTH: 723 base pairs 

(B) TYPE: nucleic acid 

(C) ST HANDEDNESS : double 

(D) TOFOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

TGTACTGAGC ACGGCGAATA TCCAGTGTTC AAATTCCACT TTGCAGCGAC TGCATGATGT 60 

CTGCGGCGCG GTAACAATCA GGGCATTACT GTGTTTGCTG GCGGCGATGG AGACAACCTC 120 

ACGCCCGCTA CCGACCGTGC CTTCCGCCTC TTCTTTAGCC GCCGTGAGCG TGCCGCTGAC 180 

CTGCTTCAGC ACATCGACCA GATCTTCGGC TTTGCTGTAT TTGAGATAGA AAACCTGGCT 24 0 

GTTGCCGCTG CGTTCCATTT CTGAGTCCAG CCGACGGATC AGGCGGCGCA TTTTGTCCCG 300 

CGTGGCCGGG TCACCACTGA CAATCACACT GTTGGTGCGT TCGTCGGCGA CAATTTGAGA 3 60 

TTTCAGCGTC GCAGGCTGGT TCTCGCCGCT GTTTTTAGTC AGGCTTTCCA GCACGCGGGC 4 20 

GATTTCCGAA GCAGAGGCGT TATCCAGCGG GATCACCTCT TCAGTGCGAT TANCCGCGTG 4 80 

ATCCACACGC TGGATCACTT CCGTCAGCCG CTCCACGACG GAGGCGCGCC CGGTGAGCAT 54 0 

AATCACGTTG GAGGGATCGT AATTAACAAC GTTGCCTGAG CCTGCGCTGT CGATCATCTG 60 0 

GCGCAGAATC GGTGCCAGTT CGCGTACCGA AACATNACGT ACCGGCACGA CTTTGGTGAC 660 

CATTTCATCG CCCGCGTATT GTCGCTGCCT TCACCAACCA GCGGCAGGGC TCGACTTTCG 720 



723 



CGG 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2556 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

TAGAGGATCC CCGGCGTTGC GATCGTCACG AACATAGACC CACAKCCGTC CGGTAGGTAT 60 

TTACCCTGAC CCGGYTCCAG TACATTTACC GGCGTGTCAT CGGCATGCAC TTTACCCGGC 120 

ATCAGCACAT AGTGCTTCAG TTCATCATAC AGCGGGCGAA GCTGCTCTCC CATGATGTCA 180 

ACCCAGCGCC CCATCGTATT GCAGTGCAGC TCCACGCCCT GGCGGGCATA GATTTCCGAC 24 0 

TGACGGTACA GCGGCAGATG CTCGGCGAAC TTAGCCATGA TTATGCGGGC CAGCAGAGCC 300 

GGACTGGCGT AACTGCGCTC GATGGGTTTT GGTGGCTGCG GAGCCTGAAC TATACAGTCG 360 

CACCGGCTGC AGGCCAGTTT TGGGCGAACC GTTTCGATTA CCCTGAACGC GGTGTTGATG 4 20 

ATATCCAGTT GTTCAGAGAT GCTTTCTCCC AGCGGTTTCA GTTTGCCGCT GCAGACGGGG 4 80 
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CATTCGGTTT CTGCCGGGGA GATAACCTGC CTGTCACGGG GAAGTGTTGC CG GAAGTGCT S4 0 

TTGCGGACGG GAGAGTCTGA TGTTTTCGGG GCTGTCTCTC CGGGCATTGA GGTGAGTTGC 600 

AACTGCGCCT CACCAAGCCT GTTCTGGAGC TCGGTTATAC GCGTTTCTGC CCGTGCGATC 660 

TTCTTTTCTA TC7TCTCGCG GCTTTTCTCG CTGCTGCGAC CGAACAACAT TCTCTGTAGT 720 

TTAGCGACCA GCGCTCTGAG TGAGCTGATC TCGCGGCATA GCCGGTTATT TCACCAGACA 78 0 

GACGGACGAT AACAGCCTGC TGTGCGATCA GCAGGGCCTT CAGTTGCTCG ATGTCGTCGG 84 0 

GGAGTGTGTT GTTCATTCCC CTGTTTTATC ACGGGTTATA TCCGGATGCC AGGCCGTTCT 900 

GTCCGTTTGG 3ATGTTGCCA CGCGATCCCC TCCAGTAGCA TGGATAACTG AGCTGGCGTC 960 

AGGTGCACTT TCCCTTCCCG GGTTACCGGC CAGACGAAGC GGCCCCGTTC CAGGCGTTTG 102 0 

GCGAACAGGC ATAACCCGTC ACGATCGGCC CACAGTATTT TCACCATTTT GCCACTGCGG 108 0 

CCCCGGAAGA CGAAGATATG CCCGGAGAAC GGGTCATCTT TCAGCGTGTT CTGCACCTTC 114 0 

GAAGCCAGGC CGTTGAAGCC ACAACGCATA TCTGTGATGC CAGCGATGAT CCAGATTCTG 1200 

GTACCGGTTG GCAGCGTTAT CATCGGGTAC CTCCTTTTAT TTCGCGGATT AGCGCCCGTA 12 60 

ACATTTCCGG AGTGAGAGGG TCAAACAGTT TTACCACACC TGATTTAAGA TGCAGCTCGC 132 0 

ACCGTGGGAC GTTTCCGGGA TCACACTCAG GGCACTCATC AGGCTTGTTA CGCCAGAAGG 138 0 

GATTTGTAAC TGGTCTGGTC GGCTCTGGCG TATCAGTCAG AGCCACCGGG ACAGGCATGC 14 4 0 

ATTCCTGTAT GTCATCATCG CTCAGTAAGC CGTCCTCGTA CTGGCTTTTC CATTTAAACA IbOO 

GCAGGTTATC ATTGATACCG TGCTCTCTGG CGATCCGGGC AACAACAGCA CCGGGCTGTA IS 60 

ATGCCTGCTT AGCCAGACGG ACCTTAAATT CACGGCTGTA GCTGGCTCGC CGTTCTTTTC 1620 

GCCATGTGCC TTCGCTGATT TGAGGCTCTG TTAATTCCTT CTTTCTGTTG GCATAAAGGA 1680 

TGGCGTCAAG CTGAGCTAAT GAAACTGAAT CGGGCAATGG CCATGCGATA CCGGATGCAA 17 4 0 

TAAATCGCTG AAAAAGCGTA TGTATTGTGG AATGACTGAG ACCTAGACGC TGAGCGATGG 18 00 

CCCGGATGGT CAGTTTATCT TCAAATCTTA AACGCAGAGC ATCAGGCAAA TAAGAACGGA 18 60 

AGCAGGGAAT ATCTTTTTTT GTCTGGGAAT TCATCGTTCG TGTCCATCTA TATAGATGGG 192 0 

CGCGATTGTT GCCAGACAGG ACAATTTTCA CAAGACGTCG CAGATGGGGC GCTTACCAGA 1980 

AATGCGCGGG TACGACAGTG ACTCGTCAAA TCTCAGTTGT AGCACACGCG GGATCAATTC 20 4 0 

CGGATTGTCT GCCAGTACCG CCTTTCGTGC ATTCATCTTA AATGTCCCTT TACTGCAAAA 2100 

ATGGACATTA GTATCGGAAA CAGGAAAGGG AGGCGAAAGA CGGTTTAAAT GAGACGGTTA 2160 

CCATTGTGTC GGGCTGTGTA CGTTCTCCCC GGACAGACAG CCTCAGTTCG TAGAATCTAT 2220 

AAATTACTGC TACTGATGCT GCCGGGGAAA GGCGTAACGA AAAAAC AG C C TCCGTTACCG 2280 

GACAGCAAGG AGGCTGAATG GAGTTTACAG GATTTGCTTT TTTATAATGT CTGGCCATGC 2340 - 

AGTAAAACCG GACAGGTTTT ATTATCATGT GAGGTATTCT GACATAAAAT GCTGGATTTT 2 4 00 
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TATTTTGTGA CGAATGCTGC AAAATTGCAT CTGCACTCTG ATG TAGCTTT TATCTGTTTG 2 4 60 

AGTGAAGCAT GCCCACAAAC TGAGTTATTA AGTTGTGGAA GAACAGTTTT GTCCCGCCTG 2520 
CATCTCTCCT TTCAAAAACC AGTATGTCGC CATGCC 
(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 790 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDHESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

CAGTTAGTGT TAAAAAATNT CCTCTGCTNC AGAAATTACA CCCACCAATA TACAATNATT 60 

AATAAATTTT CGGTTGGGTT AGGTAATGGC TGGGATTCGA TAATATCTCT TGATGGGGTT 120 

GAACAGAGTG AGGAAATATT ACGCTGGTAC ACAGCCGGCT CAAAAACAGT AAAGATTGAG 180 

AGCAGGTTGT ATGGTGAAGA GGGAAAGAGA AAACCCGGGG AGCTATCTGG TTCTATGACT 24 0 

ATGGTTCTGA GTTTCCCCTG AATAAGATGA TGGATTATCT GACTGGCTGT TCATCAGTCG 300 

GATAATGATG AAAACTGATG AGCAACAGGT TGTGCGGGCA ATGTGCAGGA TCCGTCACCA 3 60 

AAGGGTGGAA GTTGCGGGCG ACTCAGATAA ACGGGTTACA TGAGCTATTT CTGGAGTTTG 4 20 

ACGAAGCCGT CTGGAAGGGA GAAGAGGCGA TTCCATTGAT GTCTCTGGAA AACATCTGTC 4 80 

AGTCGTGCTG CTGGAAATAT TGATAGAGCA ATGGGAATGG TTATCCAACA TTGATGAACA 54 0 

TATTGTATAT TTACAGAAAT TTTTAAAAAC AGGACTCAGC AGGTTAAATC GTGTAAAAAT 600 

TACTCATGAA TACCATTATG GGCTTACAAA GCGATGTGGT TAAGCAGATC TTATTCAGGC 660 

CTGTGCAGCG TAGGATTACA ATAGGATCGA ATAACGCCAT ACAGGGGAAT GGGAGATAGG 72 0 

CTGATTCATC CTGTGGCTAT AACCAGGAGC ATATCGGGAA TCMANTATGT TACCCCAGAT 7 80 

GGAACACCAT 790 
(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10906 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

GCGGCCGCAG TACTGGATCT CTTTGCGGCA TGACGATGAG GGGGAGAGAA ATAAACTTAA 6 0 

CCCAGTCATG GCAGATGAAG AACAGGCTTA CGTAAAAGGG TTATATGAAG GGATTATGCT 120- 

GATTGGTAAT ATAATCAATA AGCCTGAAGA AGCTAAAGCG TTAATCAAGG CAACTGAAAA 180 
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TGGCTGCAGA ATGGTGAGTA ACCGGCTGCA ACTTCTACCC GAAGAGCAGC GTGTTCGTGC 24 0 

CTATATGGGG AATCCTGAAT TGACCACTTA TGGTTCCGGA AAATATACAG GATTAATGAT 300 

GAAACATGcT GGCGCAGTAA ACGTCGCCGC TTCCACCATT AAAGGTTTCA AACAGGTCTC 360 

GATAGAGCAA GTCATTGAAT GGAATCCTCA GGTAATTTTT GTGCAGAATC GTTATCCTGC 4 20 

TGTAGTGAAT GAAATACAGT CAAGCCCACA GTGGCAGGTA ATAGATGCTG TCAAAAATCA 4 80 

TCGTGTTTAT TTGATGCCAG AGTATGCCAA AGGATGGGGC TATCCGATGC CCGAGGCTAT 54 0 

GGGGaTTGGG GAATTGTGGA TGGCG AAAAA GCTGTATCGA GAAAAATTCA ATGATGTTGA 600 

TATGCATAAA ATAGTCAATG ACTGGTATAG AACGTTTTAC CGTACTGATT ATCAGGGTGA 660 

AGACTAATGC GAGTGCTTGC TGCGGGCAGT TTACGCCGGG TATGGAAATC ACTTGTGTCA 720 

GAGTATCAGG CCGATAATAT ACAGTGTGAT TTTGGACCAG CGGGTATATT AAGGGAGCGT 7 80 

ATTGAGGTGG GTGAGGCATG CGATTTTTTT GCATCAGCCA ATATGACTCA CGCACAGATA 84 0 

TTAATGt CCG CAGGanGAGC ATTGTGTATT AAACCTTTTG CCAGAAATCG TTTGTGTTTG 900 

TATGTTCGGG CGAATAAATT CAATGAGAAT GACGACTGGT ATTCTTTATT AAATCGGGAA 960 

ACATTGCGAA TCGGAACATC AACGGCGGGA TGTGATCCAT CTGGTGATTA CACTGAGGAA 1020 

CTGTTTGAAA ATATGGGGAG TGTCGGTGAA AAAATAAGGC AACGGGCTGT AGCATTAGTT 1080 

GGGcgGGAGG CATTCGTTTC CTCTTCCAGG AAATGCGATA gcAGCGCAGT GGTTAATTGA 114 0 

AAATGATTAT ACTGATCTGT TCATCGGTTA TGCCAATTAC GCTCCTGGCT TGCAATCAAT 1200 

TGATTCAGTA AAAGTTATAG AAATAGCGGA ACCTTATAAT CCGATTGCTA TCTATGGATT 1260 

TGCCTGTCTG ACCGATAATG CCCTGCCACT TGCCGACTTT TTAGTTTCAC CTGTTGCCAG 1320 

AGGTATACTT GAACAGCATG GGTTTATGCC TCCAGGTACG TTATAGCCCC CTGTCTTACA 138 0 

GCTGtCTCTT gATCAGATGT CCTGATCAAG AGACTTCATC ACCAGGTAAC CCTCAACCAT 14 4 0 

ATCCTGCATA TCCTGAAGTC TGAACCAGCC ATCCCACATA ACTACCCAAC CGGGGCGGCC 1500 

TGTGCGTTTG CTGTCATGCC ATCGCCCCAG TTTCGCCAGT TTCAGACAGG CCCATTTCAG 15 60 

TGTCGGCGTC TGTGACGGAA GCGGTTTTCC TTCCAGCTTA ACCCACAGCA GTTTCCACTC 1620 

TGTCGGCGTC AGTATTTTCT TACAGCTGTC ATTTTGTGTT TCTTCACTGA TACCTCCCTG 1680 

CCGCAGGCCa GCACCCGTAC CGCGATAAAC GCCTTGATAA CCACCATGCG CTCAAGGTTA 17 4 0 

TCCCGGGTCT GCATTCGCAG CGATTCCACA CATGTACCAC CACTTTTCCA CGCCTTGTGG 18 00 

TATTGCTCTA TCAGCCaGCG TCGCTCGTAA TGGCTGACGA TACGTCGCGC ATCGGCGGCA 18 60 

CTCGCCACTT TTTCTGACGT CAGGAGATGC CAGCAGGCAC CGTCCTCTGC CTGCTCCCGG 1920 

CAACAGACAT ACGTGAGCGG GAGCGCCTGG CCGCTGTTGT CGGGATTTTT TATGCTGAcT 198 0 

TCGTTGTAAC TGATGAACAT CCGGGCCtgg CGGGCTGCCC GCCCGCCTTT TTGCATCACA 2040- 

TTCAGCGTGT GGCTTCCCGC GGTTGCCAGG ACTTCCGGCA GTTCGAAGAG CTTGCCGGGT 2100 
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GCTTCTTCCA 


GCCGGCGATT 


CTGTGCAGCA 


CGCACCACGA 


AGCGCTGTCC 


GTGGCTGACT 


2160 


TTATAATGCA 


GGTAATGCTA 


GATATCCGCT 


TCCCGGTCAC 


AGACAGTGAT 


TACCCGTTTC 


2220 


TGTATCTCCC 


CCAGCCGTTC 


GGCCATACGC 


TCCGAAGCCT 


GCTGCCAGCG 


GTAACTTTCT 


2280 


TTTTCTTCAT 


AGGGACGTTC 


TTTTCGCTGG 


TGCTTAACAC 


CATAGGTgt C 


CGTGACCCGA 


2340 


CTCCAGCGCT 


GCTGTTCGAT 


AAGACCGACT 


GGCAGGGCGC 


TGTCGGGGGC 


GTACATCAGG 


2 4 00 


ACAGAGTGAG 


CCAGCAGCCC 


GCGCGTCTTC 


GGGTTAGTGG 


TGGTATTGCC 


CAGGTCATCA 


2460 


GATGCCGTAC 


TGTGGCTGAA 


GTTAATGGTG 


GTGGTGTCTT 


CCAGTGCGAG 


GAGCAGCGGA 


2520 


TGAGCCTCAC 


ATGCCCTTAC 


AGTGGCGGTA 


AATGCGGCTT 


CGGCAATGGC 


TTGCGGGGAC 


2580 


ACAGACGGGT 


TACGTATCAG 


GCGGTACGCA 


CCTTCAACCT 


GAGCAGTGGA 


CTGGGATGAT 


2640 


TTCACAATAG 


AAAGACCTGC 


ATGCTGAGCG 


AGAGAAGAGG 


TCAGTGACAC 


AAGGCGTCGT 


2700 


GTACGACGCG 


GATCACCGAG 


ACGGGCATGT 


CCAAACTGCT 


CGTTAGCCCA 


TGAATAACAA 


2760 


TCAGAAAGTA 


CCATAACAGA 


GTCGAATAAA 


ATGAAATATA 


AGAGAAGATC 


AACGGGTGAA 


2820 


GAAAAAGTTC 


AAAAAATGGC 


TACCGGGGAG 


GAAGGAAAGT 


ACCGGATGGA 


AAGAGCCCCC 


2880 


CTAAAGCAGA 


CTGACAGACA 


TCACAAATCC 


CCGGGGGGGA 


CTTGTGTATA 


AGAGACAGGT 


2940 


CTTACAGGGG 


GAGCGTCCGT 


CTTTTTATCA 


ACATCAGGCA 


AT G AC AT AAC 


ATTATGAACA 


3000 


AGCTCACAAG 


TCTGATGGTT 


AAATTTTATA 


ATGCTCCTTA 


CTAAGACCGT 


ATTTTTTCAT 


3060 


TCTGAGATAG 


AGTTTTTTCC 


GCGGGATTTG 


TAAATATTCA 


GCAACCTCAT 


TGATACGCCC 


3120 


CTGATGGATA 


TTAAGTGCCT 


CTGTGATTAT 


CTGTCGCTCA 


GCGTCCTCCA 


CTCGTCTGTC 


3180 


AAGCGGTGTC 


GGGGTTCCGA 


CGTGCATCAA 


CGGATTTGCT 


GTTTCTGCCA 


GCGGTAATAC 


3240 


TCCTACAGTA 


AATAGTTCTG 


CTGCATTGGC 


CAGCTCTCGC 


ACATTATTTG 


GCCACATGCG 


3300 


GCGCATCATC 


TCTTTGAGCA 


TCTCTTTTCC 


CACTTCCGGA 


ACAGGATGGT 


TAAGCCGTTG 


33C0 


ACATGCTTTA 


CAAAGGTAAT 


GGCGAAACAG 


TGGTTCAATA 


TCATCGGGGC 


GTTGAGTTAA 


3420 


TGGCAGGCAA 


GCGATTTGTG 


TGATTGCAAA 


GCAGTAATAG 


AGCTCCGCGA 


TGATATGGTT 


3480 


GCTGGCGGCC 


AGCTCGACCA 


GCGAAGTGTC 


TCCAATACGA 


ATCAGGCGAA 


AAGGTCGGTG 


3540 


TTCCTGGCTT 


TGTAACTGAA 


CCAGATGGTA 


CTGCTGTTCA 


CGCGTCAGGT 


GTTCAGGATG 


3600 


GCTGAGCACT 


AATGTTCCCC 


CCTGAGCCAG 


CGCAATGAAA 


TCATTAAGCT 


GTGGTGCATT 


3660 


GTCTGGTGTC 


AGCTCGCGGT 


AGATAAATTC 


GCCTTGTGCA 


TTACGTCCAA 


ATTGGTGCAG 


3720 


ATAACGTGCA 


CCGGTCATCC 


GTCCTGTGCG 


TGGGGCACcG 


TAGAGCCAGA 


CGGCAATATC 


3780 


TGTTTCAGAC 


AACTGCTGTA 


AACGTCGCCG 


ATACTGATTT 


ATCCATTCAC 


TTCTCCCTAT 


3840 


CAACTCCACC 


TGCAACGTCT 


GTTGGCAATA 


CTGACGACGC 


GCAATGATTG 


ATTGACGCTG 


3900 


GCGTAgcGCC 


TCTTCAACCA 


GAGAAAGCAA 


TTTGCCGGGA 


TCAACCGGTT 


TTTGCAAAAA 


3960 


ATCCCACGGG 


CCTTTTTTTA 


CCGCATCAAC 


TGCCATTGGC 


ACGTCGCCGT 


GCCCGGTaAT 


4020 
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AAGCAGAATG GGGATCTGTT GATCATCCTG G T G AAA T AA C ATCATCAAAT CGATACCAGA 4 030 

GCAGCCAGGC ATACACACAT CACTTAGCAC AATACCTGGC CAGTCTGGTT GTATCCACGT 4 140 

CTGCGCCTCA AAAGGATTGT TACAGGCAAA AACCCGATAG CCTGACTGTT CAAGTAACTG 4 2 00 

TGTGTAGGCG TCCAGCACGT CAGCATCATC ATCAATCAGC AGAATCGAAT AT TC ACT ACT 4 2 60 

TAGCATCTTC CACATCCGTT AGTCTGAATT GCAGTACCAC ACAGGCATTC CTGGTCATCG 4 320 

TTGATGCCAG CCGTAATTCA CCTTTCATTT GCTCCATCAA CGACACACAA ATTGAAAGAC 4 380 

CAATACCCAG TCCTACTTCT TTACTGGTGG TAAACGGCTT CAATAACGAA GGCAACAATG 4 4 40 

CCTCAGGCCA GCCCGGGCCA TTATCGCCAA TGAATACGTT CAGCGTTTTA CCCTGCATTT 4 500 

GCCAGTTAAC GGTAATGACA GCGCCTTGCC CACAAACATC AAGCGCATTC GCCAGTACGT 4 5 60 

TAACCAGTAC CTGCTGGGTT CTGACCTCAT CGCCTGAAAC TGTGGCTGTA CCTTGCGGCA 4 620 

GAACAAGCGT AGCTTGCAAA GGGCGATGAC GCATGGCCAG AAGTTCCCAG GCCGCACTGA 4 68 0 

ACATCTGTGC TAAATCAACG GAATGGAGTG ATATTTCCAG TTCGGCGCGC CGGGTAAACT 474 0 

GCCGTAGTGA ACGGATAATG GCGTCAATGC GACCAATCAC CCcTTCGGCT TTACCAAGCA 4 8 00 

TCATGCTGGC CTGTTCTGTC TGGGTCTGTT cAaTGcCTGC GGGCTGTAAA CAGATACATC 4 8 60 

GACAGCGCAT TTAGCGGCTG ATTGATCTCG TGGGCCAGCG TGGTCATCGT TTGCCCGACT 4 920 

AnCCGCaGct TCGCTGTCTG AATCAGTTCG TCCTGGGTGG CTCGCAGATC GGCTTCTATC 4 980 

ACCTTTCGAT CGGTAATTTC TTGTTCAAGT TGCTGTTTTT GCACATTGAG CTGCCCGAGA 504 0 

GTATGGCGTA ATAATCCTGC AATTCTCCCC AGTTCATCAT TCCCATAAAC AGGAATAGCC 5100 

GTTTCCGTGC CTCCCAGACC AATTTGCACA ACGGCCTGAT TCAGTAGGGT AAAGCGTTTC 5160 

ACCAACCGTG AGCGGATAAA ATAATGGTTG AATACCCATG CCAGCAGTAA CGCCAGTGcT 52 2 0 

GTCGCCACCA GGATCAGCCC ACCgct AACG CGAACAATTT GTTCCATTCG TTGATTAAAC 52 80 

ATCTGCATTT GTTGATGAGT ACTGCcAAGT GCGCTTCCAG TAACGTTCTG AAGCGACCCA 5340 

GTGTCGCTTc CCTGGTGCGA CTGGCATCCT CTAAGGCTTT TTGGGCGGTG ACATATTCAC 54 00 

GCATCGTAGC CGGCATTTTG TTTTTTACGA TTCCCATATC CAGCAATTCA TCGATAGTCT 54 60 

GCCTCAGGGT AATGGTGCCA GGCCAGTCAT CCAGCATACG TATATTTTCA TCTGCCGTTT 5520 

TTTTCAGATT TTCAAAATAA CGGAGATGAG TTTCCACCTG TGTGTCGTCA TCACGTCCTG 5580 

ATTTGAGTTC ATTGAGTCTG TCACGCAGAT CGTCAACAAT CTGATTTTCA ATGCGTGCCA 564 0 

GGGTATAAAC CTGCTGCTGT TCATTTTGCA CTTCACGAGA TCGCTTCAGG TATTGCGCCG 57 00 

TATCGCCyTG TCGGGAGGCG ATTTGATCCA GCAGCGTTCC CTGCTGCCAG GTGAAATCCT 57 60 

GCACTAAAGA ATTAAGCTCG GTAGTAAAAT CATCGTGTAA CCAGTCAATC CTCGCTGATA 5820 

GCTCACTCAC CTTTTCCCGT AG T AAAAACA TGTTGTAAAG CGCACGATCC AACTCGGATA 58 80 * 

ACAGTGATCG ACTGTCCTGC AAAAt GACCG TCAGTTGTTG GCGTTCCCGG GATGACAGCC 5940 
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CCCGACTAAG CCGTTCTATG GTGTCGAGAT GCTGAATAAT CTGGGTACGA AGTTGCAATC 6000 

GCACCGTGGT GTTGGGAGCC TGCAAAAATT CATTTAGCTG GTCTACCACC AGATTCAGGT 6060 

TCCGTTCAAT AAGGAAAGCA GAGTGAATAC GGGGAAAATA CTCATCCAGC GAGTAACGAA 6120 

TTTGTGAGGT TTGTTCATGC CATGAATACA GACTGACACT ACTGACAATC AGGGTCAGAA 6180 

GTGCCCCCAT CAGAAATGCG CAACGTAAGC TGGTACTGAT ACTGACCTGT CTTAAACGCT 62 4 0 

GCCACAGCGT TATGTTTTTC ATTTCAGCTG TTCCAGTTTT TTTATCGCCA GGCGCTGGTT 6300 

ATTCAGAAAC CAGAGTTGCC ATTCGATCAT TTGCTGCTCG GCAAAGCTTT TGTTATCGAA 6360 

CTGTGCCAGC CAGACGGGAT CTTCACTGCT GGCCGCTGCA ACGGGCACTT GTGTTAACAG 64 2 0 

TGCACGTATT TCTGGTAATG GTTTCTTCAG ACGTGCCTCG GTAGTGTGGA GGGCTGGCCA 64 8 0 

GGCATCTTTT AGCTGTGCTA ACCGAAAGCT AATTGCCGTA TCAAACAAGG GCTGCACCAG 654 0 

ACGCTGACGT TTCAGGATAA GGTGATAATT CAGCGGGGGT TGATTCATCA GGAGCTGTTG 6600 

TTGGGTTGCC CGCGGATTGT CTGCGGCAAG TGGTGTCACC GGATATTTTG CTGTATTGGC 6660 

ATCGGCCAGA ATACGCTGTC CTTTGGGACT TAACAGGTAG TGAATAAAGC GACGGGCTGC 67 2 0 

ATCGACGTGT GGGCTTTTCC TGAGAATTGC AAGGTAGGTG GGGGATACGG CAGACCGGGG 67 8 0 

GAAATAGGTA AAAGAGAGAT GGGGGTCATT TAACAGTAAA TTAGCATAGT TATCGATAAC 684 0 

GGGGCCGGCA ACGGCGAGTC CGCTTTTTAT TTTAnTCGcT ACGCCAAAAC TGCGGGAGGA 6900 

GATTGTCACC AGGTTTCCTG CACTTGTCAG CAACGTTTCC CATCCTTTCA CCCAGCCTTT 6960 

TTGCTGTAGT AATGACTCAA CCATTAAATG GTTAGTATCT GAACGCGACG GACTACTCAT 7 020 

CAATAAAGCG TCCTGATAGA TCGGCAAAGC AAGATCGTCC CAGTCAGCAG GGGCAGGAAG 7 08 0 

GTGTTTTACA GAAAGCGCCG GACGATTAAT GAGCAGACCA AAACCTGATA TTGCTAGTGC 714 0 

AACGGAGGTT GCACGGATCG ACTCCGGCAC CAGGTTTTGG CTTTCTGCGG GTGCATCATC 7200 

AAACGGGGCC AGTTTCTGGT GCTCCTGAAG GTGCTGGAGC AGCATTGGTG ATGAAGTCAG 7 2 60 

GATAAGATCG ACGTTTTCTA CGTTGGCCGT ATCAAGCAAc TGTTCCAGTG AGGCACTGGT 7 320 

GCGGTTAAGC GTACGGATCA TTACCGACTC AGGCTCTGTT TGCCAGCGCT GTATTATCCA 7 38 0 

CGCGGTAGCT CCGGGTGAGA ATGTGGTGGC CATCACCAGT TCATTTCGTT GAGCCcTGAC 74 4 0 

GGCCCCGGCG TCCATCAGCA ACAGTAAAAG AATCATGGTT TTGATGCCGA TTTCGCACCA 7 500 

GCTAAAAAAT CGGTTTGTGA TCCAGGTCAT AAATATTAAT ACACCGCAAA AATCGCATTG 7 5 60 

AGACAAAAAT TACCCGTTTC AGACATTCGT CTGATAACAC GTCTGCTCAA AGAGACCGTT 7 62 0 

AATATATTAA TCAGAGATTA CCCGATAATC AGGATGAGAT TTGTTAATAT CCGCACATGC 7 68 0 

TAACAACAAA CCAGATAAAG CATAAATCTA CCTTGTCTAT GCATCAATAA AATGGGTCAA 77 4 0 

AAACAGGGTT TGATTTTATT ATTTTGTGTC AATTGTGAGA GATTTTTTCA GTTTGATGTT 7800- 

TCATYTCAAT TATATGAGTC TCATTGTCAG AATACTCCTG ATGTTCATAT CAATATAAAA 7 8 60 
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T AC AG G T G AA GACATGTTAT C AAT AT T T AA AACGGGGCAA TCGGCGGATA GTGTTCCGGT 7 92 0 

G G A G AAAA T T CAGGTGACAT ATCGTCGCTA TCGTATGCAG GCGTTACTTA GCGTATTTCT 7 98 0 

GGGGTATCTT GC AT ACT AT A TCGTGCGTAA TAATTTCACT TTATCGACGC CTTATCTTAA 804 0 

AGAGCAATTA GATCTCAGCG CCACACAAAT TGGCGTACTG AGTAGCTGTA TGCNTATCGC 8100 

CTATGGTATC AGCAAAGGAG TGATGAGTAG CCTTGCCGAT AAAGCCAGTC CGAAAGTCTT 8160 

TATGGCGTGT GGGCTGGTGT TATGTGCCAT CGTTAACGTT GGCCTGGGAT TCAGCACTGC 8220 

ATTCTGGATT TTTGCGGCAT TGGTTGTTCT GAATGGTCTT TTCCAGGGAA TGGGCGTTGG 8280 

TCCTTCTTTC ATCACTATTG CTAACTGGTT CCCTCGCCGG GAGCGTGGTC GGGTTGGTGC 8 34 0 

TTTCTGGAAT ATCTCTCATA ACGTCGGTGG TGGTATTGTT GCCCCTATTG TTGGTGCCGC 84 00 

TTTTGCCCTA CTCGGCAGCG AGCACTGGCA AGGTGCGAGC TATATCGTTC CGGCCTGCGT 84 60 

GGCTATCGTT TTTGCGGTAA TTGTGCTGAT TCTCGGTAAA GGTTCCCCAC GTCAGGAAGG 8520 

TCTACCCTCT CTGGAAGAGA TGATGCCGGA AGAAAAAGTC GTCCTGAATA CCCGACAGAC 8580 

GGTAAAAGCA CCAGAAAACA TGAGCGCCTT TCAGATTTTC TGCACTTATG TATTACGCAA 8 64 0 

CAAAAATGCC TGGTATGTCT CACTGGTTGA CGTATTTGTA TACATGGTGC GCTTCGGGAT 8700 

GATTAGCTGG TTGCCTATTT ACCTGCTGAC GGTGAAACAT TTTTCTAAAG AACAAATGAG 87 60 

CGTCGCGTTT TTATTTTTTG AATGGGCCGC AATCCCTTCC ACGCTACTTG CCGGTTGGTT 8820 

GTCAGACAAA CTGTTTAAAG GGCGTCGTAT GCCATTGGCG ATGATTTGTA TGGCGCTGAT 8880 

TTTCATTTGC CTGATTGGCT ACTGGAAAAG TGAATCGCTG TTTATGGTGA CAATTTTTGC 8 94 0 

TGCCATTGTT GGTTGCCTGA TTTACGTTCC ACAATTTCTG GCTTCCGTTC AGACTATGGA 9000 

GATCGTTCCC AGCTTTGCTG TTGGTTCTGC AGTAGGCTTA CGCGGTTTTA TGAGCTATAT 9060 

CTTCGGTGCG TCTCTGGGCA CCAGCCTGTT TGGTATTATG GTCGATCATA TTGGCTGGCA 9120 

TGGCGGATTT TATCTTCTTG GCTGCGGTAT TATTTGTTGC ATCATTTTCT GCTGGTTATC 9180 

ACATCGTGGT GCAATTGAAC TTGAACGTCA CAGAGCCGCA TATATAAAAG AACACTGATT 92 4 0 

ACCTTCCCCA GGGCCGTCTC CCTGGGGAGT GGAGTATATT ATGATTTATA AGATATCTGG 9300 

AAATCAGAGA TTAATATGGA AATTTTATAA GACTGATTAC AATAAATGGA GATGGTATTG 9360 

TCATGAGAAA AATGGATATC TTTTGTCTCA ATCAGATAAC GCATATAATT CGCAATTGTT 9420 

ATGCATTGAA AATGCTAAAA AACAGGGATA CTCAGACGAA TCGGTCTTGC CACTTTTTCT 9480 

ACATATTTCC TATATTCAGG AAAAAGGCTG GAAATGGTAT CAATGTTATG ATTGTGGATA 954 0 

T ATT GT AAAA GAAACCTCTG TTTTTTTTTC GACATACCAG GAATGTGTCA ATGATGTTAA 9600 

AAGGAATATA CTAGCATCTA TGTGTAGTGG TTGTAGTGGC ACAGTAAATT TGGCCACCTG 9 660 

ATTAAAGGTG ATATTCTCAC CACAACATAA AACAACAAGA AAACAAAGCG TACCTTCTCT 9720 

CCTGAGTTTA AACTGGAATG CGCCCAACTT ATCGTTGATA ACGGTTACTC ATACCGGGAA 97 8 0 
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GCTACTGAAG CTATGAATGT TGGTTTCTCT ACTCTGGAGG CATGGGTACG TCAGCTCAGA 98 4 0 

CGGGAACGTC AGGAGATCAC GCCTTCTGCT GCAGCACCAC TCACATCAGA GCAGCAACGT 9900 

ATTCGTGAGC TGGAAAAGCA GGTGCGTCGT CTGGAGGAAC AAAATACGAT ATTAAAAAAG 9960 

GCTACCGCGC TCTTGATATC AGACTTCCTG AATAGTTACC GATAATCGGG AAACTCAGAG 10020 

CGCATTATCC GGTGGTCACA CTCTGCCATG TGTTCAGGGT TCATGGCAGT AGCTACAGAT 10080 

ACTGGAAAAA CCGTCCTGAA AAACCAGATG GGCTGTATTA CACAGTCAGG TACTTGAGCT 1014 0 

ACATGGCATC AGCCACGGTT CGGCCGGAGC AAGAAGCATC GCCACAATGG CAACCCGGAG 10200 

AGGCTACCAG ATGGGACGCT GGCTTGCTGG CAGGCTCATG AAAGAGCTGG GGTTGGTCAG 10260 

CTGTCAGCAG CCGACTCACC GGTATAAACG TGGTGGTCAT GAACATGTTG CTATCCCTAA 10 320 

AAGCAACAGC AAACAGCGAC CACTGGGGAG CCCTGCATTG CGGGATTGTA TTGTTCAGCG 10380 

GGCCATGCTG ATGGCGATGG GGCCGAGGAG AGTGATTTTC ATACGCTCTC ATATGGTTTT 104 4 0 

CGACTTGTGC GAAATGTCCA CTACGCGATC CGCACGGTGA AACTGCAACT CACCGACTTC 10500 

AGGGGAAACT CGGGGCCGCT GGGTAATGTC ACATAAAAGT TCTTCGGTGT GATAAACAAC 10560 

GAGAGTATTT GATTCCTTTA TGGTGGCCTG GTGCAGAGCT GCCCTTTCCC AGGACCTCCA 10620 

TATAATTTTT GTAGCGGCAG TCAGTGGCAC ACTCAGTTAA CTACTTTCAC TTCAGTGACT 10680 

TTGAATGAGT CAGGGCTGCC GTTAAAGGTG TTAATGAAGG CTTGTATTTT CCACTTCTGG 10740 

CCTGGTTCAA GATTGGATGC TGTGTCGATT GTTTGACCGA TAACGACTCC ATCTTTTAAN 10800 

AGATTAAATT TTACATAAGC ATTTTTGACA ACAGAGTTTG ATTTATTTNC AGCATAACCC 108 60 

ACAATTGCCT TCGTCCCACT TGGGGTGTTT TCCACATGAA GGTTAG 10906 
(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7430 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
ATGGTTATTT TTATTTCCTG CACCTTGCTT CATTTGAAAT AAAAACATAT GCATACGACG 
CTGCCATTGA GCAGAAAAAT ACAGGAATTA ATGTTATGAG TTAACCATAA TACCTGTGTT 
ATGAATATCT GACATAAACA AGAACAATTC ATATCTTCTG TATTCAGCAG AATAATAAAA 
GTTCGTCTGC CATTCTCAAA CTTATTCTTC GGAATACGTT GTTTCATGAA AGAAGGGGCC 
GGAATAAAAG CTGGTCACCG TAATGCTAAT ATTAATGCAG ACTACCGCCT TCTGGAATTA 
ACAGTCATCA ACCAGCACAA ACCATTAGCA ATCAAACAAA TTTTAATTAA CAAAATTTTA 
GCTAATACAA TTACTGCATT AACCACTCTG CAGTTTGCCT TCTCAATAAG TTACAGATGC 
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C AAA C AAT A C TCTTTTATAT GTTATAACAT AACACAAACA A T AAA T AAA G AACAGACGGC 4 80 

ACTCCATTTC TCCACGTAAG TGAGCCATCA GAATCGCTTA TGAATGTGTA CGGCAGACGT 54 0 

ATACTCGTGT TTTACTGCAG CAACCGGAGC AAAAGTTGCA CTTCCACAGC CTGGGTTAAG 600 

TTTTTCATGC TTGTGGGCTC GTCCTCCCTC CATTTCCACC GCGGGCAAAC AAGGCCATCT 660 

TTTGTCTGGC CACACAGCAG AT G GAG AG TO GAATTATGCT GTCTGACGAC ACCGGGAACA 72 0 

AATATGCCAT GCCTTCGCAC AATGAACCCG GGCATCATCG TTTTATCTTT ATAATCGAGA 7 80 

CAGGTATGAG GGAAAGTCGG ATGATAAGCA GATAGTGAGT GAGGCGCTGG AACATGGCGC 8 40 

TCTGGCAAGA GAAGTGTCAC AGGTTAGCTG ATGATATGGG GCAACCTGAT ATCTACTTAC 900 

TTTTTTGCCT ACTCTCTTAC TTCATGCCAG CAGCGAGGGT ATCGACATTG TGTTTGAACG 960 

CTGCCGTGTA GGTAGCAGCG AGGCCGCTAC TGTCGGTAAG TGCTTCCGGA TAAAGCTCTC 1020 

CTCCCGCTTG TGCACCACTG GCATTGGCGA TTTGTTTCAC CAAACGGGGA TCTGTCTGGT 1080 

TTTCGATAAA GTACAATTTT ACGTGCTCTC TCTTAATTTG ATTAATCAGT TTCGCCACAT 1140 

TTTTACTGCT AGCTTCCGAC TCAGTGGAGT ACCCCACTGG CGACAGAAAG CGAACCCCGT 1200 

AGGCGGCAGC GAAATACCCA AACGCATCAT GACTGGTCAG TACTTTACGT TTTTCTCTTG 12 60 

GAATAGCAGC AAACGTCTGC GTGGCGTAAT TATCCAGTTG CTTCAACTGC TGGATATAGC 1320 

TGTCACCCTG TTTTCGATAA TCGCTGGCGT GCTCCGGGTC TGCTTTGCTC AGGCCATTGA 1380 

CAATGTTGTG AGCATAGACA ATACCGTTTT TCATGCTGTT CCAGGCGTGC GGATCAGTGA 14 40 

TGGTGATCCC ATCCTCTTTC ATTTTCAGTG TATCTATTCC GTTAGACGCG GTAATTACCT 1500 

CACCTCTGTA GCCAGAGGCT TTCACCAGAC GGTCCAGCCA TCCCTCCAGT CCCAATCCAT 15 60 

TGACAAAGAC AACATCCGCC TGTGCCAGCG TTTTGCTGTC TTTCGKCGAC GGTTCAAATT 1620 

CATGTGGATC ACCATCCGGT TGCACCAGAT CAGTGACATG AACGTATGGG CCGCCAATCT 1680 

GGCTGACCAT ATCGCCCAGT ACCGAGAAAC TTGCCACCAC ATTCAACTCT TTTGCAATCA 17 4 0 

CCAGTGGGCT CACTAGTAGG CTGGACAGTG CCACAACCAA AATGGACCGT TTCATCTTTC 1800 

CTCCTTCATC TCGTTGCTAT GTGTAAAAAC ACTTCTTGTC AGCGACATCT GCATAACATG 18 60 

CCGCCATTAG AGCCAAACAG AACTGAAAAG CAGAAAAACA GAGTGCTCGT GAGGATGACT 192 0 

GCAGGACCTG CAGGCAAATC AGCGTAATAA GACCAGATCA GTCCAACCAG ACTGGCGCAG 1980 

GTACCAATAC CCACTGCAGC TAACAACATG ATGGACAGAC GTTGACTCCA GAAACGCGCG 2 04 0 

CTGGCAGCCG GTAACATCAT AATACCGACT GTCATCAGGG TGCCAAGTAG CTGGAAACCT 2100 

GCCACCAGAT TGAGTACCAC CATTGACAAA AACAGGCAGT GGATCAGCGC CCGCGACCGA 2160 

CGTGACAGAA CTTTCAGGAA AGTGACATCA AACGACTCAA TCACCAGCAC CCGGTAGATC 2220 

AACGCCAGTA CCAGAACCGA ACCGGAACTA ATTATGCCGA TAGTGATGAG AGCATTGGCG 22 80 - 

TCAATAGCCA GAATGGAACC GAACAGCACA TGCAGCAGGT CGACACTGGA GCCACGCAAA 2 34 0 
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GAGACCAGGG TGACGCCAAG TGCCAGCGAG CCGAGGTAAA ACCCGGCGAA ACTGGCGTCT 2400 

TCTCTCAATC CAGTGCGGCG GCTGACCACA CCAGACAACA TCGCCACAGA CAGCCCGGCA 2 4 60 

ATGAAGCCAC CGACTCCCAT CGCAACCAGC GACATGCCCG ATACCAGGTA GCCAATTGCT 2 52 0 

ACTCCCGGCA ACACCGCATG GGACAGTGCA TCACCGATCA GGCTCATACG GCGCAGTAGC 2 58 0 

AAAAAACAGC CAAGTGGCGC GGCGCTCAGG GTCAACGCCA GACATCCGAC CAGCGCCCGA 2 64 0 

CGCATAAAAC CGAAATCGCC AAATGGCTCG CACAACAGGT GCAGTAACAT CATGGCAGCA 2700 

GCCCCTGCTG CGGTGGCGTG GCTGCAGCCG TGAGGGAATG GAGTATATCG GCACTTCTCC 2760 

CCCATCGGTG GCCTTCCGCA CTGAGCATCA GTACATGAGG AAAGTATTTT TCTACCTGTT 2 820 

CCATGTCATG CAACACCGCA AGAATTGTAC GTCCTTCCAG ATGTAGCTGC CGAATAACAA 2 880 

CCAGCAGAGT ACGGATAGTC TGAATATCAA TGCCAGTAAA TGGTTCATCC AGCAGAATAA 2 94 0 

CCGACGGCTG CATCACCAGC AGTCGTGCGA ACAGTACGCG CTGTAACTGA CCACCGGAAA 3000 

GTGTGCCGAT GTGCATCGGC GAAAATTCTG TCATACCGAC GGTATCCAGC GCTTCGATAG 3060 

CTTTTTTTCG CCATAGACCG GAAATACGAC CGAACATCCC GCTGTGTGGA ATACATCCCA 3120 

TCAGCACCAG ATCGTTAACA CTCAGTGGAA ACTGGCGATC AAATTCAGTC AATTGGGGCA 3180 

AATAACCTAA CTGGCGTTGC CCCTGCGGTG CCATGCAGAA GCAACCACCC AGAGGTGGCA 324 0 

GCAGACCGGC CAACGTTTTA AGCAAGGTGG ATTTACCTGT GCCATTCGCT CCGATAATGG 3300 

CAGTCAGTGA ACCGGTGTCA AAACATCCAT TCAGCGTACC CAGCGGGTGC TGTCCCGAAT 3360 

AGCCAAATGC CAGTGAATGT AATGCGATCA TGTCAGTACC ACCGCCCAGG AAATAAGAGT 34 20 

CCATAACAGT ACCAGCAGCA CACCGACGAT ACCCAGTCGG GCTATTGCGG AAAAAG CAT A 34 80 

AAGACTGACC ACAGTATCCC CCATCAAAAT TGTTATAGTA TAACATTATT GCTTTATGGG 354 0 

TGCCGATGAT AGGTAAGAAA ATGTGTCATG GCTTCTGCAG CGTAAGCATA CAGCGAGAGC 3600 

AGTATTGACA GGGATGCGTT AGTCATTTAG CAGTGTAATG CGCTAAATAG NTGCGCGGAA 3660 

TAGTAGATCA CTTTGAGGGT ACTCAGCCCG GATTGTGCGC TCTGATCAAT CGCCAAATCA 3720 

AAACAAATCA CCAACCGAAC TGAGCAATGC CGATCATAGC ACCAATTTCC CGTGACGAAC 37 80 

G AC AC C G G AT GCAGAAAGCC ATCCATAAAA CACACGATAA AAATTATGCC CGCAGACTGA 38 4 0 

CTGCCATGCT GATGCTGCAC CGGGGCAACC GTATCAACGA CGTTGCCAGA ACGCTCTGCT 3900 

GCACCCGTTC ATCTGTTGGA TGCTGGATTA ACTGGTTACT AAAATCATTC CCTGCCGGGC 3960 

GTGCCCATCG CTGGCCATTT GAGCATATCT GCACACTGTT ACGTGAGCTG GTAAAACATT 4020 

CTCCCGACGA CTTTGGCTAC AAGCGTTCAC GCTGGAATAC AGAACTGCTG GCAATAAAAA 4 080 

ATCAATGAGA TAACCGGTTG CCTGTTAAAT GCCGGAACCG TTCGCCGTTG GTTGCCGTCT 414 0 

GCGGGGATAG TGTGGCTAAG GGTTGTGCCA GCTCTGCGTA TCCGTGACCC GCATAAAGAT 4200 

GAAAAGATGG CAGCAATCCA TAAGGCACTG GACGAATGCA GCACAGAGCA TCCGGTCTTT 4 2 60 
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TATGAAGATG AAGTGGATAT CCATCTTAAT CCCAAAATCG GCGCTGACTG GCAGTTACGC 4 320 

GGACAGCAAA ACGGGTGATC ACGCCGGGAC AGAATGAAAA ATATTATCTG GCCGGAGCGC 4 380 

TGCACTGCAG GACAGGTTAA AGTCAGCCAT GTGGGCGGCA ACCGCAAAAA TTCGGTGCTG 4 4 40 

TTCATCAGTC TGCTGAAGCG GCTTAAAGCG ACATACTGTC GAGCGAAAAC CAGCACGCTG 4 500 

ATCGTGGGCA ACAACATTAT CCAC.AAAAGC CGGGAAACAC AGCGCTGGCT GAAGGAGAAC 4 560 

CCGAAGTTCA GGGGCATTTA TCAGCCGGTT TACTCGCCAT GSGTGAACCA TGTTGAACGG 4 620 

CTATGGCAGA CACTTCTCGA CACAATAATG TGTAATCATC AGTACCGCTC AATGTGGCAA 4 660 

CTGGTGAAAA AAGTTCGCCA TTTTATGGAA ACCGTCAGCC CATTCCCGTA GGGGAACATG 4 740 

GGCTGGCAAA AGTGTAGCGG TATTAGGAGC AGCTATTTAG GAGAACAGCT CGCTGACCCG 4 8 00 

GTTGACTATG ACTCAAGCCC ATGACGAAGA TAGCTTTCTG GATCAACATC GTTCAGTCTG 4 8 60 

CACGTCCCAA TCCAGCCACC AGCCACCAGC CACCAGCCAC CAGCCACCAG CCACCAGCCA 4 920 

CCAGCCAGGC TACAGTGCCA TCCCGACCTC CCCACGTAAA CCCAGGGACA GGCTAAAGGC 4 980 

AGAAAATGGG GAAGGCAGTA TGACTCTCCG TGACACAGAT GCGGGTACCT GATGGGAGTG 5040 

AGATCATCTT CCCCTCCCGG TCAGTTCCCG GATCAACACC GTGAGCAGCT CTGGCGAAGG 5100 

TTTTTCCAGC GTCATTTTAC CGTAACGAAA TTCAACCTTA CAGGAACTGG CACAGACTGT 5160 

GCACTAAGTG GCAGTGGATA AAAGCGGAGT AAGAGCCGCC ACAGGCTCTT TCTGCTCATC 5220 

AGGCATTATC TCAACAGGTA ATAATTCAAC GCCAGCGCCA GAAGAGGTTG TTACCGGAAG 5280 

ACGCCGCGCC CCCCTTCGTT CAG 2CAGAGC CTGAGCCATT TGACCAGGAG GTTATCATTG 5340 

ATATCGTGTT CCTGGTCAAT ACGGGCAACA GAGGTGCCTA CGACGTTTTT TCAGTTCGGT 54 00 

TATCTATTGA CTTAACTCTT TGGCCAGTAA TGCTGCAGCC CCCGTGCCAT GAATAAACGA 5 4 60 

GTGGTCGCAG ACCACGCAAC ATGCAACATC ATTCAGATCC CCCGCTAATA TTACAGGTAA 5520 

TTCAGAATCA GCAATACTTT TCCCGACCAT TAAAAGTTCT GAGTCACGAT CAGTTGACTC 5580 

ATCACTTTCA GTCGGGCTCG GTGGAACAGG ATGAAGACAA TGTAATCTTA TTCTCAAACC 5 64 0 

TTCTGGCATA TGAACTATCA TATTCATGGA GGGAATTTCC TTGTCCACTA AATACTGTAT 5700 

TTCTGCATCA CTTAAAATCA TCCAGGAATA TACATGCATG CCATATAAAT TTTCTTTCGG 57 60 

GCATTTCAGG GAGTATGGAA ACACTTCATC CAGAGGTGAT AGTTTCTGTT CCCACCATAA 5820 

GTTTGTTTCA AGAAGAACAA GTATATCAGG TTTTTCTTTA TTTATAAGTT CAAGAATGGG 5880 

TATATATTTT TTATTGGTCA TAAGAACATT GAATACCAGT ATACTTAAAC CCAGAAATCC 5 94 0 

ATCAGAGTCC TTTATTTCCT TTACCTGCTT CTTGCCAATT ACTGTATAAG GAATTATCCA 6000 

TACCAACTGG TAAGCGACAC AAATTAAACT TATTATCCCA ACAAACAACT CTGTAAATAA 6060 

GTCAAGAAAA ACAACAGACA GAAAAACATT CAAAGTACAC AGCAAAAGTA TCTGTAGTCG 6120- 

GGGAAAATCC CATCCCCCGA CAACCCATGA TGTATTACCG GAAACAGGGA TAAAAGTTAT 6180 
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GACTGCCAGA AGGATAGCAG TAAAAATAAA AACACAAGTT ATCACAAATC GCTCCTTGTT 62 4 0 

CTGAACCGGA ACACAAAACT GTCATATACG TTTCAAAAGT AAAAATACAC TGCTGCCACA 6300 

AGATTTACAG CGTAACCGGA CAGCATATCC TGATTACGGA CAATCCATGA AACCGCCTCA 6360 

CCAGAAGCGT CGATCACATC CGTTTTTTCC CTGTTTTATA TTCCCCGAAA CATTTTATTT 6420 

TCAGGAATCT CCGGGCCTTT ATGCCGCATC ATTGCAAAAT GGCATCTGAA TCGATCATGA 64 80 

TTTGGCATCC ATCTCCGATC ACAGTTTGGC ATCACAATGG ATCACGATTT GGCATGCTTC 65 4 0 

CGATCATTGA TTAGCATCCT GCCAGTCAGT CCGGGAATTA ACTCTTTTCG CCACAGTCTT 6600 

CATTGCCGTG TTTAAACCAA TGGAGACGGC AATGTCCAAA AAGAGAATAT CCAGGAGCAC 6660 

TATGGATAGC TGTTTTAAGA TCCTTCAGCT CAAGTTCGAC CAGAAGCTGG CTAACCGTTG 6720 

TATCGGACTT GCAAAACACC AATGGGGATT GATCTCTATT TTGCGACACA GACGCATTAT 67 80 

CAATACATGG ATGGTGCGAT CAAATACCTG AGTGGTCTCA CCGTGGATCA AATCCAGCAA 68 4 0 

TTGCTCACAG ATTAAGACTC GTCGGGAGTT TTGAGCCAAC ACCAGCAGTA ACCGATATTC 6900 

ACCTTGAGTG AAATCTACAG GCTGTTGATG AGCATCAACC AGCACGTAAC GGTCCGGGAT 6960 

CAAGTGTCCA GGCGTTAAAA AAACCACTCT ACTACCCTGC TCGACCTAAG CGTCGGCGTT 7020 

CAGCCGCCTG AACGGGTATG GCAAGGGTGA AAAGAAACAG CATCCCCACA GTACCGACCA 7 080 

GACGACAGGA TGATGCTGGA ACAGAAAGCA TTCGCACCTC TCTTAGAATT AGACAGTGCG 714 0 

TACAGGATAC GTAAGACAGG GTGACGGGGC GGCGATAAAC TCTATTTACA AAGCTGAAAA 7 200 

TTTTCTGACG ATGAAAAACT ATTCAAGAAG GTTATCTGAG GCGTTAAAAT AACCAGCTCG 7 2 60 

ATTAACGACT AACTTGAGGT GAATATGAAT TTAAAAAATA TAATTTTAAG TACTGTTTTA 7 320 

TCAATCGCTA GTTGTCATGC CCTGGCTGTA GGTAATTCTC CAAATAGCGC TATCTAACCT 7380 

TCATGTGGGR AAACACCCCC AGTGGGGACS AAGGSCAATT GGTGGGGTTA 7 4 30 
(2) INFORMATION FOR SEQ ID NO: 65: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6681 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D) TOPOLOGY: linear 



(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

AGATTATTCT GGCTCAGATT CATTTTTCAT CAGTCGCTTT CCCCTATAAA CCGTAAGGTT 

CCATAGTGTC GACGCTCTCG CTTAATTCCC ATATCGTCGA TAGTCTTATT AGCCGCTTCT 

GTCAGGTCAG AAAAAGTATC ACGCTTCTTT GGGAGTTCAA GTCAGATTTC TCGCCGTCGG 

GCGATGCGCT CAAAATGTTT GTCTGTATGG GGTCGCTTCA TCACGTCAAG CCATCGCGCT 

GCCGCTCTCC GCCAGAGTAC AAGCTCTTCC AGTTGTTCTG CTTTTTATCT TATCTGTGGC 
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GATGCAGTAT 


CCTCCTCCGT 


TTGTGTAAAT 


CGTTGAGTGG 


T GAA T C AC GC 


AAAGGGGCTT 


3 tO 


CTTTTTTCTG 


ATCTATCCCC 


ATATTCTTTA 


GCGTTCTGGT 


CGCAGCATCT 


CTGATGTCGC 


420 


AG AC AC T GAA 


CCTTTGTATT 


TTCCATGATC 


TTGTGGAGTT 


TTCGATACAT 


CTGCTCCGAT 


480 


GCTGGGTTAT 


AAAGATCCGC 


TCTTTATCAT 


CCTTGGCTTG 


TGTAAGCAAT 


TCTCCCCAAC 


540 


GTTCTGCTGC 


ACGCCGCCAT 


AACTCTCTTC 


TTTCCAGTTC 


CTCAGCTTTT 


TCATCATGTA 


600 


CCATTCGTGT 


ATCCCCGTTT 


ATCCAGTCTG 


AACCGCACCG 


GGTTTCCTGG 


AGAATGTTTT 


660 


CTCTGTGAAC 


TCAGGCTGCC 


AGATCATCGT 


TTCCGATGGA 


AGCATAATAA 


GCTTTTTCTG 


720 


CTTCTGCCGG 


ARGAATATGG 


CCCAGCTTTT 


CCAGCAATCG 


TCGATTGTCA 


TACCAGTCCA 


780 


CCCACGTTAG 


TGTGGCCAGC 


TCCACTTCTG 


TCCGTTTTTT 


CCAGCTCTTA 


CGGTTATTAC 


840 


CTCCGTTTTG 


TAAAGACCAT 


TGATGCTCTC 


CGCCATTGCG 


TCGTCATACG 


AGTCGCCTGT 


900 


ACTCCCTGTT 


GATGCCAGTA 


ATCCGGCTTC 


CTTAAGCCGT 


TGCGGACACA 


TAATGAGAGC 


960 


CTTTATCGCT 


GTAATTGTCA 


ACGACGGATG 


AAAAGTGATC 


CACTTATATC 


TCCACCAACG 


1020 


GCCCAATATT 


GATCCACCGT 


TTTACTCAGG 


ATTAGCTTCT 


GCTATAACCC 


CGGCCTTTCG 


1080 


TTTCTGTCTG 


AGTCGATAGC 


TTTCTCCTTT 


GATTTGAACG 


ACATGTGAGT 


GGTGTAAGAT 


1140 


ACGGTCCAGC 


ATCGCTGAGG 


TCAGTGCTGC 


ATCACCGGCG 


AACGTTTGAT 


CCCACTGCCC 


1200 


GAACGGCAGA 


TTGGATGTCA 


GGATCATTGC 


GCTCTTTTCG 


TAACGTTTAG 


CGATGACCTG 


1260 


GAAGAACAGC 


TTTGCTTCTT 


CCTGACTGAA 


CGGCAGATAG 


CCTATTTCAT 


CAATGATGAG 


1320 


CAGGCGGGGG 


GCCATTACTC 


CACGCTGAAG 


CGTCGTTTTA 


TAACGGCCCT 


GACGTTGTGC 


1380 


CG TAG AT AAC 


TGAAGTAACA 


GATCTGCTGC 


TGTTGTGAAG 


CGAACTTTGA 


TACCTGCACG 


1440 


GACTGCTTCA 


TAGCCCATCG 


CTATTGCCAG 


ATGGGTTTTC 


CCCACACCTG 


ATGGCCCCAG 


1500 


TAATACGATA 


TTTTCATTAC 


GTTCTATGAA 


GCTGAGTGAG 


CGTAACGACT 


GGAGTTGCTT 


1560 


CTGCGGTGCT 


CCGGTGGCGA 


ATGTGAAGTC 


ATACTCTTCG 


AACGTTTTCA 


CCGCCGGGAA 


1620 


GGCTGCCATT 


CGGGTATACA 


TCGCCTGTTT 


ACGTTGATGA 


CGTGCCAGTT 


TTTCTTCATG 


1680 


AAGCAGATGC 


TCCAGGAAGT 


CCATATAACT 


CCATTCCTGG 


TCTACTGCCT 


GTTGTGACAG 


1740 


CGCAGGCGCT 


GCGCTTATAA 


GGCTTTCCAG 


TTGCAACTGC 


CCGGCGAGCG 


CCATCAGTCG 


1800 


TTGATGTTGC 


AGTTCCATCA 


TCACGCCACT 


CCTCTGCAGA 


ATGAGTCGTA 


GAT GG AG AG T 


1860 


GGATGATGCA 


GGGGGTGTTT 


GTCGAAGTTC 


ACCAGATTTT 


CATCAAGATG 


CACGTCATAC 


1920 


TCTTTTTTCT 


CCGGAGCAGT 


GCCAGCATGG 


ACTGCTGTCT 


TCGAGCCAGC 


GATCGCAGGG 


1980 


ACGGGCCTGG 


ATTGTTTCAT 


GCTTTCGTTG 


GTTAGCGACA 


TCGTGCAGCC 


AGCGCAGACC 


2040 


GTGGCGGTTG 


GCTGTTTCAA 


CATCGACAGT 


GATCCCCATC 


GGGCGCAGGC 


GAGTCATTAG 


2100 


TGGGATGTAA 


AAACTGTTAC 


GGGTGTACTG 


CACCATCCGT 


TCCACCTTAC 


CTTTAGTCTG 


2160 - 


TGCCCTGAAG 


GGGCGACACA 


GTCGGGGAGA 


GAAGCCCATC 


TCCTTGCCGA 


ACTGCCACAG 


2220 
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CGAAGGATGG AACCGGTGCT GACCGGTCTG ATATGCGTCA CGTTGCAGAA CCACAGTTTT 2280 

CATATTGTCA TACAACACTT CGCGCGGCAC ACCACCAAAG AAGCGGAACG CATTACGATG 2 34 0 

GCAGGTCTCC AGCGTGTCAT AACGCATATT GTCAGTGAAT TCGATGTACA GCATTCGGCT 2 4 00 

GTATCCGAGA ACAGCAACGA ACACGTGAAG CGGTGAGCGA CCATTACGCA TAGTGCCCCA 24 60 

GTCAACCTGC ATCTGTCGTC CGGGTTCAGT TTCGAACCGA ACGGCAGGCT CCTGCTCCTG 2520 

AGGAACCGAG AGAGAACGAA TGAATGCCCT GAGAATGGTC ATTCCGCCAC GATATCCCTG 2580 

GTCTCTGATC TCGCGAGCGA TTACCGTTGC CGGGATTTTG TAAGGATGAG CATCGGCGAT 2 64 0 

GCGTTGACGA ATATAATCCC GGTATTCATC CAGGAGTGAA GCAACAGCAG GTCGCGGCGT 2700 

ATATTTTGGC GGCTCAGATT TTGCCTGCAA ATAACGTTTA ACCGTATTGC GGGAGATCCC 27 60 

CAGTTCTCTG GCAATCGCCC GGCTACTCAT TCCCTGCTTG TGCAGGATTT TAATTTCCAT 2820 

AACTGTCTCA AAAGTGACCA TAAACTCTCC TGAATCAGGA GAGCAGATTA CCCCCTGGAT 2 88 0 

CTGATTTCAG GCGTTGGGTG TGGATCACTA TTGCACCGTT CGTGACAGTA ATGGATTGTG 2 94 0 

TCAGACGGAC GACGGGCCCA TAACGCCTGC TCCAGTGCAT CCAGCACGAA TGTTGTTTCC 3000 

ATGGACGATG AGACTCGCCA TCCCACGATG TATCCGGCGA ACACATCAAT GATGAACGCC 3060 

ACATAAACAA AGCCCCGCCA TGTGCTTATC CCGGTAAAAT CAGCTACCCA CAACTGGTCC 3120 

GGGCGTTCTG CGATGAACTG ACGGTTTACA CCGTTGCATG CGGCAACAGC TTTCCGGCTG 3180 

ATTGTCATGC GAACCTTTTG CAAACCCCAT ATATTTCAGA CGATACCGTT CAACGGTAGT 32 4 0 

GAACCCACCA TCACCGCTCC CGGTATCCCG CTCATGCTGG TATACCCAGA CATGCAGGGG 3300 

TTCCAGCGTA CAGCCAATCT TTGGGGCAAT GGAACAAATT GACGCCCACT ACGAGTCATA 3360 

CGACTTTCCA GAACAATACG GAGCGCCCGC TGACGGACCA CCAAAGAGCC GCCATTATTC 34 2 0 

TTATTACCTT TAACTAATAA TGCCAATTCA GACCCAAACA CGGCATCATT CGCTTCAGCC 34 80 

TCTGCGCCAT TAATTAATGC CAGGACTTGG TCAAGAAAGC GTTGCGCTTC GTTTACATCT 354 0 

GTTGCTTGTC GCAGGTAATA AGGTATTCGT TCAACAAACT CGGAACGTGA TAAAGGCTGA 3 600 

TGCTCCAGCA AAACCTCAAG CATTGCGGGC CGCAACAAAC GACGCTCAGC ATCAACATTG 3660 

GGAAACTTAA CCTCAATGGC ATATGTGGCA AAATACTTAA GTTGCTCCTT AAGCCCCAAA 3720 
TTAGGCATAA GAGAATCAAT TGAGCCAGAC GCCACTGCAG CGCTTGATTC AATTGTTTCT 3780 
ACATACTCGT AGGAAGGTAC AACAACATCT GGAGCCAATG TTTTAAGCTC ATGGAGTTGA 38 4 0 

CGGATAATCG GGGATAGAAC CTCATCAGGA TTACTGAACC AATCAGTGGA CCAAATACGG 3 900 

CTAATTCTCC ACCCCAAACG CTCCAAAACC TCTTGACGCA AACGATCACG GGCAGATTTA 3 960 

GCTGAATGAT AAGCCGCACC ATCGCACTCT ATACCCATTA AGTAACAACC CGGATCTTCT 4 020 

ACCGACAGAT CAATAAAGAA TCCTGCAACC CCACCTGAGG TTCACACTCA AACCCAGCGT 4 08 0- 

GATTGAGTGC TTCCATTATA GCAACCTCAA AGTCACTATC CGGAGCCCTG CCCGTATACG 4140 
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TCGTGAGGGA ATCTAATTTG CCACTTTCGG CAAACTGTAA AAAACCTTTC AACGAAATAA 4 200 

CACCAAATTT ACTGGTTTCA CTCGTCAATA CATCTTCAGA ACGCATTGAA CTAAACACAT 4 2 60 

GCATCCGTTT CTTTGATCGA GTTAAAAGCA CATTCAAGCG GCGCCAGCMA ACATCGGAAT 4 32 0 

TGACAGGCCC AAAGCGTTAA TAAACCTTTC CACCATGCTC AGAAGGTCCA CAGGTAAAGG 4 38 0 

AAATAAAGAT TACATCACGC TCATCACCTT GAACGTTCTC AAGTTTTTTC ACAAAAAGTG 444 0 

GCTCTTCCAT GGGATATAAG GCATGAATTG CATCGTTAAA TTCAGTGCGA TTTCGGCGCA 4 500 

ATTCATCAAT AGCGCGCTCA ATCTGATCGC GTTGCCTGGA ACTGATGGCC ACTACCCCAA 4 560 

GAGATTCATC CAGCCGGTGT TGCGCATGAT GAAGTACAGC CTCAGCAACT GCTTGGGCTT 4 620 

CTTCAATATT GTGTTGATTA GAGCAACGAC CTTTTGATAC ATAAGTAAAT TTGATTCCAT 4 68 0 

ACTCTGGAGA CTCAGCATTT GGAGAAGGGA ATATCACCAA ATCACTGTTA TAAAAATGGC 4 74 0 

GGTTAGAGTA TGCAATTAAC TTTTCGTGTC GTGAACGATA GTGCCAATGC AAACGTCTCA 4 800 

TAGGAAACAG TGGCAAAGGA GCATCCAAAA TGCCGTCAGT ATCACTTAAA GCCGCGACAT 4 8 60 

CATCGTCATC TTCTCCGGCG GAACTTCGAT CTGAAGTGGC ACACTGAATT TGGCCACCTG 4 920 

AACAGAGGTG ATATGCTCAC CTCAGAACAA CACAGGTGCT CCAATGAAAA AAAGGAATTT 4 98 0 

CAGCGCAGAG TTTAAACGGG AATCCGCTCA ACTGGTTGTT GACCAGAACT ACACGGTGGC 504 0 

AGATGCCGCC AAAGCTATGG ATATCGGCCT TTCCACAATG ACAAGATGGG TCAAACAACT 5100 

GCGTGATGAG CGTCAGGGCA AAACACCAAA AGCCTCTCCG ATAACACCAG AACAAATCGA 5160 

AATACGTGAG CTGAGGAAAA AGCTACAACG CATTGAAATG GAGAATGAAA TATTAAAAAA 5220 

GGCTACCGCG CTCTTGATGT CAGACTCCCT GAACAGTTCT CGATAATCGG GAAACTCAGA 52 80 

GCGCATTATC CTGTGGTCAC ACTCTGCCAT GTGTTCGGGG TTCATCGCAG CAGCTACAGA 5 34 0 

TACTGGAAAA ACCGTCCTGA AAAACCAGAC GGCAGACGGG CTGTATTACG CAGTCAGGTA 54 00 

CTTGAGTTGC ATAACATCAG CCATGGTTCT GCCGGGGCAA GAAGCATCGC CACAATGGCA 54 60 

ACCCGGAGAG GCTACCAGAT GGGGCGCTGG CTTGCCGGCA GGCTCATGAA AGAACTGGGA 5520 

CTGGTCAGTT GCCAGCAGCC TGCGCACCGT TATAAACGAG GTGGTCGTGA ACATGTCACT 5580 

ATCCCGAATC ACCTTGGGCG GCAGTTCGCA GTGACAGAGC CAAATCAGGT ATGGTGCGGC 564 0 

GACGTGACGT ACATCTGGAC GGGGAAACGT TGGGCATACC TTGCCGTTGT TCTCGACCTG 5700 

TTTGCAAGGA AACCGGTAGG TTGGGCAATG TCGTTCTCTC CGGACAGCAG ACTGACCATC 57 60 

AAAGCGCTGA AAATGGCCTA GGAAATCCGC AGTAAACCAG CGGGGGTAAT GTTCCACAGC 5820 

GATAGTAATA ATGCCGGTAT CAGTTTTTAT CATCACTCTG TTTGCTGTTT AACCAGACTG 5880 

GTGTGATTAC TGATGCAGTG AAGACCTTCC CGCATCCTGA CTCACACAGC GATCGACCCT 5 94 0 

TTGTGTCCTG CCCTGGACCT GTCGGTTGCC GGAAGCGCCT TCATGCGAGG CGTCTCCTCA 6000 

CCGATGCGCG TGACTCAAGA AGGGCCTGAC GGTTTGTCTC GTTACTGTCC TGTCCGGGTT 6060 
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ATCTGTCTGG AGATTCAACT CTGTTTCCTC ACAGGAGCTG TGTTATGGCA GGTAAAGTTA 6120 

CGGAAACCGC TGTTGTGGGT GGCGTGGATA CACATAAAGA TGTGGACGTT GCCGCTGTCG 6180 

TAGATCAGAA CAATAAAGTT CTGGGGACCC AGTTTTTCTC CACAATACGG CAAGGTTACC 6240 

GGCAGATGCT GGCATGGATG ACTTCGTTTG GGGCATTAAA GCGAATTGGT GTTGAGTGTA 6300 

CTGGCACCTA TGGATCAGGT CTGCTTCGCT ATTTACAGAA TGCCGGGTTA GACGTTCTTG 6360 

AGGTGACTGC GCCAGATCGG ATGGAGCGAC GCAAACGGGG TAAAAGTGAC ACGATTGATG 64 20 

CTGAATGTGC CGCTCACGCC GCATTCTCCC GAATAAGAAC CGTCACACGC AAAACGCGCA 64 8 0 

ATGGCATGAT TGAGTCTCTG CGGGTATTAA AAACTTGCCG AAAAAC AG C A ATATCAGCCC 65 4 0 

GCAGAGTCGC TCTCCAGATT ATCCATTCCA ATATTATCTC TGCCCCGGAT GAATTACGTG 6600 

AACAGCTCAG AAATATGACG CGCATGCAGC TCATCAGGAC TCTGGGATCC TGGCGGCCTG 6660 

ATGCCAGTGA ATACCGCAAT G 6681 
(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERI ST ICS : 

(A) LENGTH: 1342 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 



TATTCGCGCA 


TACGCGTTGC 


ACATGTTCTT 


TTGGCGAACG 


ATCATCGGCA 


ATACAGAGTT 


60 


CCCAATGGGG 


ATAGCTTTGA 


GCCAGGACAG 


AATCCAGACA 


GGCACGCAMG 


TAGATCTCCG 


120 


CTGGATTATA 


AACAGGAATC 


ACAATAGATA 


TAACTGGAGG 


GTGAGTCATA 


CTGGCAAGCA 


180 


TCAGACTCAC 


CWCTTCKTTG 


CCAGGCAACG 


AAGGTAATTC 


CACCGTTTCT 


ATCCATTCCT 


240 


CATAACCGAC 


AGAAGACGGG 


GTAACGCTGA 


ACGTYTCGTT 


ATAGAATGCT 


TGCAGGCGCT 


300 


CTATTGACAT 


ATCGCCATTG 


TSCATCAATA 


TGGATTTTWT 


GATTTTTTCT 


AGCGGCATGT 


360 


CACGATAGCT 


TTGGTGTTCT 


TTTTGAATGC 


GAGCCAATAG 


TGCAGACTCG 


ACTACTTTCA 


420 


CATCAACAGC 


CGCTATTTCA 


AACTGATTAA 


TTGCAAATTT 


TGCTGCCTGT 


TCTAATGGAT 


480 


CAAATCGTAA 


TGCACAAGAG 


GCGATTCCAG 


ATAGAACAAC 


GACTGACGCT 


GACCGCTCGT 


540 


TTATATGGCA 


ACGTTACTGT 


TTCAAACTCA 


TTGAACCCTT 


TACCTGTATC 


CAAATRTAAC 


600 


TTAGCTAATC 


CTTGCTTTGG 


TTGGGCAATT 


AATAGAGATA 


TTAAATTGAT 


ACCATCCCTT 


660 


GCTAATATTT 


GAGAGCTGCT 


CCAAATCAAT 


AATGAAAAAT 


GGATCATTTC 


CCTCTGCAAC 


720 


CCAACTTTGT 


GAATTATCTA 


TATCTATCGA 


GAGCTGATTT 


GTTGCCAGAT 


AGGGCAGCAC 


780 


AACTGTATTT 


TGCATTTTAC 


TCACTGCAGG 


AGAAACGTCC 


CATGCTTCGC 


ATGGTTTCCT 


840 


ACCAAGTAAC 


ATCCCATAAC 


GCTTAAAATG 


TTCTCTTGCT 


GACAACCCGG 


TCTGTTTCAC 


900 
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AT C C AAA TAG TTATGCAGAT ACCAATGTTC ATCAAAGTGA GCTAGCAACT CGTCTTGGTG 9 60 

ATTTTTAACC ATCACTTTTA TTCTCCCTTA TTGACAGGCA GGCAACTGCG CTGCTCAAAC 10H0 

TTCCCATACA TAATGTAATG AAGCAGCGGA TTAATGCCTC CTTGGGCCAC ATCCGGATAG 108 0 

GTTTGCAAAT ACCAGCGAGT ATCAAACTGC TCACTAGGGC TATAACCTTT ATCCGCCCCC 114 0 

ACGCTAATAA AATGCTCAAG AGCTGAGAGC CCAGTGTCTG CAACCTCTGG GTAGCGATGT 12 00 

TGATACCAGA GTTCATCAAA CAATCCTGAA GCGGCAANTA CTCCGCGGCA CTCTCTGTAG 12 60 

CTGTTGTTCT GGATGGAGTC TCCTCCTTAA ATGTTCTGCC AAGAGCACGA ACTGGGGCTG 1320 

TAATCTTCCA AGAGACGGTT CT 134 2 
(2) I N FORMAT I ON FOR SEQ I D NO : 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1580 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
CGAAGGAAGC AGTNTGCNGC CTGCGCTGGC GGAGTTGCGC CTGTTCCCAC CGATGATGCT 60 

GTACATGAAT CCTCCGGCGA ACAGAGCGGT GAACTGGAAA CCATGCTTGA ACAGGCCGCG 120 

GTCAATCAGG AACGGGAATT TGATACCCAG GTGGGGCTGG CGTTAGGGCT GTTTGAGCCG 180 

GCGCTGGTGG TGATGATGGC GGGCGTGGTG CTGTTTATCG TCATCGCCAT CCTCGAGCCG 24 0 

ATGCTGCAAC TGAACAATAT GGTTGGAATG TAATTTACGG AGTTATCACA TGAATTCGTT 300 

ATCCCGCACA CAAAAACCAC GGGCAGGTTT TACCCTGCTG GAAGTGATGG TGGTGATTGT 3 60 

TATTCTTGGC GTCCTGGCAA GTCTGGTGGT GCCTAACCTG TTGGGCAACA AAGAGAAARC 4 20 

CGATCGGCAA AAAGCCATCA GCGATATCGT GGCGCTGGAG AATGCGCTGG ATATGTACCG 4 80 

ACTGGATAAC GGGCGTTATC CGACCACTGA GCAGGGGCTT GAGGCGCTGA TCCAGCAACC 54 0 

GGCCAATATG GCGGATTCCC GTAACTACCG TACCGGTGGA TACATTAAAC GACTGCCAAA 600 

GGATCCGTGG GGCAATGATT ATCAGTATCT CAGCCCGGGT GAAAAAGGGC TGTTTGATGT 660 

TTATACCCTG GGGGCAGATG GTCAGGAAAA TGGGGAGGGC GCTGGCGCAG ATATCGGTAA 720 

CTGGAATTTG CAGGAGTTTC AGTAATCAGT GCCTGAACGC GGATTCACAC TTCTGGAAAT 780 

CATGCTGGTG ATTTTCCTTA TCGGCCTTGC CAGTGCGGGC GTGATACAGA CGTTTGCGAC 84 0 

CGCTTCAGAG CCGCCTGCGA AAAAAGCGGC GCAGGATTTT CTGACTCGCT TTGCGCAGTT 900 

TAAGGACAGG GCAGTGATCG AAGGGCAAAC ACTCGGTGTG CTAATCGACC CGCCTGGCTA 960 

TCAGTTTATG CAGCGTCGTC ACGGACAGTG GCTACCCGTT TCTGCGACCC GCTTATCGAC 1020 

ACAGGTTACG GTGCCAAAAC AGGTGCAGAT GCTGTTACAA CCCGGCAGTG ATATCTGGCA 108 0 
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GAAGGAGTAT GCGCTGGAGC TGCAACGTCG TCGCCTGACG CTGCACGATA TTGAACTGGA 114 0 

GTTGCAAAAA GAGGCGAAAA AGAAGACGCC ACAGATCCGT TTTTCGCCTT TTGAACCCGC 1200 

CACGCCGTTT ACGCTGCGCT TCTACTCAGC GGCGCAAAAC GCATGTTGGG CGGTAAAACT 12 60 

GGCACACGAT GGCGCGTTAT CCCTCAGTCA ATGTGATGAG AGGATGCCAT GAAGCGTGGA 1320 

TTTACCTTGC TGGAAGTGAT GCTCGCGCTG GCGATTTTTG CGCTGGCTGC CACGGCGGTG 138 0 

TTACAGATTG CCAGCGGCGC GCTGAGTAAT CAGCACGTTC TTGAGGAAAA AACGGTAGCG 14 40 

GGCTGGGTAG CTGAAAACCA GACCGCACTG CTCTACCTGA TGACCCGCGA ACAACGGGCG 1500 

GTCAGGCACC AGGGCGAGAG CGATATGGCA GGAAGCCGCT GGKTCTGGCG AACCACACCA 1560 

CTGAATACCG GTAATGCGCT 1580 
(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3241 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

CTTAACCATT ACCCAGCATT TGGTAGTTAA ATAGTCGTTA AAAGCATAAA ACATGGACAT 60 

TGTGCCATCC CAGCTAAAGC ATCCATTACC GCCTGACAGG GATAAAAATA AAAAAGCAGG 120 

GAACCATTTT TTCATCAGAA ATCACTTCCG TAATTACAGT TATTCATTTA GGTATGACTC 180 

AGTTATAAAT CATGCTCATA CTGGCCGTGG TCTGGRAATC CCCGCCATTC AGTATCCCGC 24 0 

TGCCATTACG AAAGGGCACT GAAGTAAAGG TGAACGTTGA ACGTGCTGTG TCCAGACCTG 300 

CTGTCACTCC GTAACCATTT CCTGAACCAT TACCTAATAT AAGAGGTGTT GACATTCCTT 360 

TTCCCTGATA CAGCGCTATA CCAAAATGAG TTATATTTGT TGCCAGTACA TTATTCTGAC 4 20 

CTCCTCCCAT AGTATTTCCC GTAACTTTTA TCCAGAGAGA GCCACTCTTA TACGGACAGG 4 80 

ATATGCTTAT GGTTTTTGTG ACTTCACCAC GTGAGTTGTC CACGTGCTCA GGATTAATAT 54 0 

TCCCAAAATC AACAACAATA TTCTGCCCGT TATTAATGGT GCATGGGGGG ATATAAACAT 600 

TCCCCCTGAT GTTAATCTGC ACATCAGCCA GTACAGCGAC CGATGTCAGA AGCAACGATA 660 

TAAATAATGA TAAACGAATC ATTCCCCTCC GGAGAGCGGT ACAGAAAACA TTTTATTTTA 7 20 

CGAGATATAA AATTAACGTA TTTTAGTTGA TACTATTACG AATATGATGC AACCAGCGTT 7 80 

GCTGTTGCAG AGAAAGGACC GGCTATCAAA TTCTGCATAT TCCCTTTATA TCCAAGTTTG 84 0 

GCATGAAGTG ATATAGTTTT ATCTGCATTA TTACCTGTGA TTTTTCCGGG CGTAAATGGA 900 

GTCCCTAAAG TTATCGCAGT CCCAATATTT CCTGCATTAC TGTTATAAAG ATAAACGAGT 960 

AACCCATCAG AAGATGTGTT TGATGTATTC TGAACTAAAA TAGCATTGTT ATAAGTGTTT 1020 
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GTTGCCGTTA TCGTAACCTT CATTGTTCCC AGATTATAGG GACACCGCAT ATTCACAGTA 1080 

AACTCTTTTT CGTGATTTCC ATTTTGACTC AGGGTCTGAA TCTCTACATC CTGCCAGTCA 1140 

ACAGTTGTGT TGCTTACAGT ACAGGCAGGA ATAATCAGTT TTCCTCTGAA GGTCAGATTA 1200 

TCAACTGCAT GTACATGCTG AGACATTAAC ACTGCGCCCA GCATTACCGG AAGACACAAA 12 60 

CCTCTTATCT TTTTCATCTG AAATATCCTG TACAAAAATT TTGCTAACGA TATGTCAATT 132 0 

CAAACGTGGC TGTTGCTTCA TAATCACCGG GTACCACACT CTTCGTCCGC AGGCTTCCGG 138 0 

CGTTGCCACA ACATACGCGC CGAAAGGAAG CTCAAGACTG TTTCCGGTAA CCTTTTCCCC 14 4 0 

CTGGCCTTTG TTATGGGAGG TGCCGGGTTT CAGCAGACTG CTGCCATCGG TGTCCAGCAG 1500 

TGCAATGCCT AACCGGCCAG CATTCACTCC GGTTACCTTC AGATGGCCCG GGAGGGCGCC 1560 

TCTTCCGTCC CCTTAAAGGT CAGGGTCACA ATTTTGCCAA CTGCTGTTGC ATGGCAGTTT 1620 

TCCAGCCTGA TGACAAACGA CTCTGTCGGG GAACGTCCGG GCGGATACCA GAAATCCCTG 168 0 

GACGCCCGGG TTTTGAAGAC GACATGTTTA TTCAGACTGT CACCGGACAC ATGGCAGGGT 17 4 0 

CTGTCAAGCA GATTACCCCT GAATGCCACA TCTGAGGCTA TTGCCTGTCC GGCAGACAGT 18 00 

GCGGCAAACA GTAAAAGAGC GCCTGTGCTT TTTATCATCA CATTCCCTTA CTCATATTTT 18 60 

ATGCTCAGAC GCAGCATGGC CGGATTGCTC CTGGCATCAG AATACTCACC CTCCTGTGTC 192 0 

GCCCTTTTCC TCCAGGCGGC CAGCATCTCC TCCTGCCGCC GGTCAGGCCG GCACAGTAAA 1980 

AAGGTATCAC CATCGTGTAT AACAAGATGG TCACAGCCGG ATAGCTTACG GTCAGGAAGT 204 0 

AAAGCACTTC CGCTTCCGGG ACCGGTTACC AGTGAGCCGG AGACTGTCAT CGCAACGCCC 2100 

CGTTTTCCGG GCTGAAGTGC ACCACCGTCC CCACATCCTG CCAGCCTCAG CATCAGAGGT 2160 

GCTCCGGCTG CCGCAGAGTG ATTTTCCGGC CGGAGGYTTA ACGGCACCTC ATTACTCACC 2220 

AGCGTGCAGG GTGAGGACAG CAGTGCACCA CTGACGGTCA GGCTTCCGGT GCGTCCCCCC 2280 

CGTTCATTTA TCCGGTAATG ACGCAACTCA TCTGCAGTAA AGACGTCATC GTATATACCC 2 34 0 

CGCTCTTCAG CCCGCAGGAA AGTATGGATG AAACCACTCA GCGACAGTGC AATAAGATAC 24 00 

AGTACTGCTG TTGTTTTATT CACAACCATA ATATCCCACC CGCATTTAAC CGTTATTGCG 24 60 

GTACATTATT TCTCTTTTTT CACAGAGCAA CGGCTACCAT TACAGATAAA CGACAGTACC 2 52 0 

GGGCGACCAC CATAGTCATT AATATAAGAC AGATAAGGGG TAT T AT AAT T TGCCGATTTT 2 58 0 

ACTGTCTGCT CTGAACGGGG AGACAGCATC ACGGTTTCAA ACTCACCTTC CTCTGCCTGC 2 64 0 

TTTTCACTTC CTCCCAGACC AATAACAGTG ACATAATAGG GCGTTGGGTT TTCAATACGA 2700 

TACCCACCGC TGACTTTGTT CAGAATTAAC TGGTCCTGCC ATACTTCATT TGGTCTGGTT 27 60 

TTAATTGCTG CCGGGCGATA AAAAAGCTTT ATTTTGGTCT GTAAGGCTAT CTGCAGTACA 2820 

TTGGCCTTTT CACTCCTCGG CGGTATTTCC CTGAGATTAA AATAAAACAG TGATTCCCTG 2880" 

TCCTGAGGAA GTTTACTGAT ATCCGGTGTG GTACTCAGCC TGACCATGCT TTTCGCACCC 2 94 0 
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GGCTCAAGGC 


GCTGAACCGG 


AGGGvj I (jLiL-H. 




CTGTAATAAT 


TTTTTCCTGA 


3000 


TTTTCATTTT 


CTATCCATGC 


CTGAGCAAGA 


TAGGGCAGTT 


GTTTGTTATC 


ATTGGAGATA 


3060 


TCAAGCGTCA 


TTGACTTCTC 


ACTCCCGTCA 


aacaccgcgc 


GGGTTCTGTC 


CAGCGAAACA 


3120 


GCAGCGTCTG 


CCCCGGATAT 


AACAAACAGG 


gggatggcag 


CCATCAGAAT 


CTTTTTTCGA 


3180 


ATCATACTTA 


ATTTCCACAT 


tctgtaattt 


cacctggtcc 


GGAAAATGGC 


ATAACCGCAT 


3240 
3241 



(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 398 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
AACGTGGATC TCCAGCTGAT CGGTGCCGTA TTCCAGGTCG TAAGTTTCAC TGATGGTTTC 
ACGCGGCAGT TTGCCCGGTT TACGGACCGG TACAAAGCCA ACGCCCAGAC CCAGAGCTAC 
CGGAGCGCCA AACAAGAAGC CACGCGCTTC GGTGCCGACA ACTTTGGTAA TGCCCGCATT 
TTTGTAACGC TCAACCAGCA AGTCGATGCT GAGAGCGTAA TTTTCGGGTC TTCCAGTAAG 

CTGGTGACAT CGCGGAAAAG AATGCCGGGT TTTGGGTAGT CCTGAATGCT TTTGATGCTA 

TTTTTGAGAT ACTCAAGCTG CTGTGCATCG CGGGKCATAA GTGTATGCCT GCTTGTTACG 

GTGGTACTCA CGGCGCGTTT TTAAACGTAT CAAAAGTT 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17710 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 



CAGTTNCNGT 


TCTCATAGAC 


AGATTGATAA 


AATCGTAAAC 


AGCCCCTAGC 


ATTCCCGTTT 


60 


CCTTTGCACA 


CATATTCAGG 


CACGGGGATA 


AAGTATAAAG 


AATGTCGTAC 


TGCTGCTACC 


120 


AGAGCAATAT 


TCCCCCCTGA 


TGGCCGTATC 


AGAGATAGTA 


TGCCGGTATT 


TTGCGGGTGG 


180 


TTCCCGTCAG 


GTTATCGTGT 


ACCTCCACGG 


TCGTAGTCAC 


CACCGGCATT 


CCGGCYTTTC 


240 


TCAGCCTCAA 


AACATCAGCT 


GCAATACGCT 


GACTGCCGAA 


CCAGAACAGG 


CCGTCCAGTG 


300 


CAGTCACCAG 


CAACCCCGCC 


TCCAGCGCAT 


GCTTCAGCCG 


TTCACGGGGC 


GCTTTCACTT 


360 


CCCGGGCAAT 


CTGCTGGTAT 


GGCGATGATG 


TGTTTTCATT 


CCCAATCACC 


CGGCGAATAC 


420 
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GATGAGACAG ATGATACCGG TATGTATCCG GCACACGGGA AAGGCTGGCC TTCAGGCTGT 480 

ACACGCAGCC AAATCGTTTA TCATTGAACA CGACATTTTT CTG GCTGATG CCCCATTCTT 54 0 

CACGCAGCGC GGCAATCAGT TGTGGTGTAC GGGTAAGCAA CAAGCGAAAA GGCAGTTCAA 60 0 

AACTGGTGAC ATAATCCACA TTCAACAGGG GAATGCGAAG TCGTTCTTCT GGTCCGGCTT 660 

CTGTCTGCCG GCACTCCTCC AGGACATCCT GCCAGTGCAG GCGAAGACGG GAAGACTGAT 720 

TCAGTTCTGT AAAG C AG TAT TTATCCGCCA GATAGTCAAT TCGTGTATGC ATACTGAAGA 78 0 

GTATTCCGTA TAAAGATTCA GCTGGGAAAA CTTTATCAGT CTGTAAAAAC TAACGGAAGA 84 0 

GTCGATATTT CTCCCGACAA TCACCGGATG ATTGTTGCAA TACGTCGTGG CATCAGAGAC 90 0 

TGAACAGGAG TTTTTAACGC AACGTATTGC TCTGATGTAT CAGGCCGGAC AACCCGAAAA 960 

CAGCCTTCCA CCCGGCATTG TCCGCCAGCG CTTATCACCG GCCAGGTCTG TTGCAGTAAA 1020 

TCCGCGACTT GCGAACATGC TTCATCAACT GTGACACTGG CCCGCGGATG GCAAATGCTC 1080 

GTCTGGCTGA GCAGCAACAG GCATCGCATT GTTGCTCCTC TATGTTGTTC CCGCAACCAG 114 0 

CGTAATACCA CCGGCGAGGA TGGACAGGCA GTGTGATTAC GCTCCGTAAT ACGTTCGTGC 1200 

ACCCGTCGGT GAAAGGAACT ACAGAATGTC TGAATCTGTT GGCCGTTGAT GTATCCTTCT 12 60 

GTCGAATGAA GTGTGAAGTG GATTGCCAGC AGATGCGGCC AGTGATCCAC CGCGTGCTGA 1320 

ACAAAACGCC GGATTTCCCC CGGCTCTGAA AGTAAGGCTT CGGTTATTTG CACTATTTTA 1380 

TCTCTGTTGA ATTTGGTTAA GTCGGTGCAG ACGCATCAAC ACAAGTACGG TTCGATGCAA 14 4 0 

ACAGCTGTGA CTGGCAATAT GAAAGGAATG ATGAATCAGT CAGGATGACA AAGTGCCGGC 1500 

TGACCGGAGG GGACGCAGGA AGATTCACGG GGGGACCAGC ACCAGGGAAC AGCGCCACAA 15 60 

TACCAGCGCT GACACGTTGA ACATTGCCAG CGTACCGGTA TCACAACACG TTTCATACTT 1620 

CTGCCCGCGT GATTCTTCGA TTCGTTACTG TATCTACTGT GACACTTCGC TTTTATACCT 168 0 

GCGGCTGGAT CGGCCCGGCT TGATGAATCT TCACTGATCA GCTTATAAAA CCCTCTGTCG 17 4 0 

GTCATACCGG TGAAACTGGT GATATAGTTC ATGTCAATCA GGGAATTATC GGCACGCAGA 18 00 

AATACGCTGT CGTGGCTTGT TGTAGTCAAC ATGGTCAGAA TGTCCTCTGT GAGATTTATG 18 60 

AAGATTGTGC GAATGCGGGG AATCTACTGA GCTGTGCTTT CAGAAGTGGG CTGTTACGGG 1920 

AKRSCAGGGA TTACCGGCGG GGTAACGGGC TTCCGGATCA TACACACCAC GATTATCGCG 1980 

GACAAAATCA CTGAACGCCC ATATCACCTC TTTAAGTATG TCTTCGCAGC CCGGTACATG 2 04 0 

ACGATCCAGC GCCACATCCC GAGTGGTACT ACTTTGATGC GCCCGGTGAC ACAAAGCCCG 2100 

GATTGTTCCA GACATCCTGA ATCAAACGCC CCAGATTAGG GGCGTCGAAA TATGCCTCTC 2160 

TGACCATTAT ATTCCGGTGT ACAGGTAGCA GGTCAGAAGT GACAATGCGT CACCTGACGT 2220 

TAAAAGTCAC TACACCCAAG ATGACGTTCA ACAGCACCAT GCGATTCAAT GTAAGCCCGG 2 280 - 

GCTGTCTGTT CCAGTACACC AGGCTCAGCG TTGTATGTGT TAGCTGCATC AAATACCAAC 2 34 0 
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GACAGCACTT CAGGATACAC AACCAGATGT GTAATGGAGT TATCTTCACC CAATACTTTT 2 4 00 

CCCCACGCCT GCTCAATCAG ATTTCTGAGA ACCACCACCT CACGACTCTT ACACCAGACA 24 60 

TCGTTATTAA GTAGCAGCAC CATAAGATAA GGAGTGGTAT CGTTAGTCAC AGCCTCCCTA 2520 

CTCCAGAGAT AATATAAAGG GGTGGGCTCA ACAGATTTAT CTTTACGTCG CTTACACTGC 258 0 

AAATATTCAG AAATGAGTCT ATGCAGTTCA CCAGTAAAAT CCGCCATCAG AGAGGGAATG 2 64 0 

GCCTTATTAA TACCAGGGCA AGGTATTAAT TTAAATTGTA ATAATTTAAT TTCAGGATGT 27 00 

GTGGCTGCAG CCCGATACAG AGTTGCAAGG ACACACTTTT GCCAGAGGGC GTTACTGGAA 27 60 

AGCTTAACGT TTGATTCTGT ATACATAATA AATCACCTTA CAGTTACAAC AGGTCAAAAA 28 2 0 

CCGCTGTAGC CAGAGTTACG CTGGCCTGAT GCTTTAGTAC CGGGCTTCGT CAGATAATCC 2880 

AGACGCTCCA ATAAGCGCTG ATACTGCTCA GGGAAATCAG GATCATGAAT ATCCTGGATG 2 94 0 

TCACGTCCAT TAGCAGGGAA ATGAATAACG CAGCCCCCTG GATTAACAAT GCAGAAATCG 3000 

TCCTGAGGTA CTGATCAATA CGGAGAGGAC TCTCGCGTGT GGTTTATTGA CACCACAGTG 3060 

CAGATTCGGC GAATCCGCGA TCACGGTGCG ATTTCGTTCC ACAGCACACA ATCATGACCC 312 0 

CGGGTTTTAT TCAGGTAAGC AGGATTGCGG ATATCCGGTG TCGCGCCTTT CTGTCACGAA 3180 

CGGGGTAGGT GCGAAACACC GGATAAAATG CAGGCTGGCA ATACCTCTGA ACGCCCTGCG 324 0 

CAGAGCGGAT ATTTTGGATT AAGTACTCGC ACCTCCGCAG TCCTGAAACA AGTCTGGCTG 3300 

GTAGCTGTAA ACAGACTTCG TACATGTTGC TCTGGAATAG ATCCCCGTGC CACAGGCTTC 3360 

GCAGAACTTT TTCCCGGGAA AATGCTGCCC GCACATCACA CAATGCCACT CCAGCACGAC 3420 

CGGTAATGGC GATAGAAACA TCGCCATATC CTCAATGTAA GGGTGGGACT TTTCCGGATT 3 4 80 

CAGCACCACG CAGGCCGCCT TCTGTTGCGC GCTCAGGGCA TGTAAATCGT GCTCAAACCA 354 0 

CGCCCCCTGA GCATCTGTCT GCAAAATCAA CCGACCACGA CAGGAAAGGC AGAAACAATG 3 600 

CCTGATATTT CTGCTAAGGC TGAGGCCGCA CTGATAATGT GTTCACCCGG CGTGATCCCC 3 660 

AGCCCCGTTT TTATACCGTT CATTCAGCCA CTCCCTCCTC ACTGAAGTGC CCTGTATGGC 37 20 

AGTGAGTGCA GTACCGCTCC CCATAATAAT CGTGGTGACA TTGTCTGCAG TGCCAGCTGG 37 80 

CTTTACGCAC CACGGGTAAG GCATCCGGTA CGAATTTCTG CAGACGCTTA ATCAGTTGTA 38 4 0 

TTTCTCTGCG CTCCGGTCTG ACATAAGGGC ACTGTTGACC GTGCTCCGTC AGCCCGTCGT 3900 

CAGTGTGTTC AAACCAGGGA AGTTCAGTGT CGTATTGCGG ATGGTATCTG AGCGCACTGC 3960 
CGCAAAGGTG GCAGGTGTAG CGGTCGTAAG GTGCAGTCTG TGCGGTACGG GCAGCGGTCA 4 02 0 

GACGTCCGTT GCCATCAAAT GCGAGAAAAG ATTTTGCGTA CATAGTATAT GTTCCTTACC 4 080 

GCCAGACGAC ACGCAGGCGT CAGCGTCCCT TTACGGGCAG CGTGGGCAGG GTGTGAATGG 4 140 

CGGTACAGTT AAGGGGGGGG TGGAAAATGG GCGGGCTGTT GTTACAGCAC TGTGGATGTC 4 200- 

ACATCATGGC GTACCAACGT AAAAAATAAT CAGCAGGCCC GGATACATCG TTGTCGCCGG 4 2 60 
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ACATCAGCCC GTCCTGCTGG TTTTGCCGGG CTCAGCCCCG ACTGCAGGGG AAATTACGCT 4 320 

CACCAGTGGC GTGAGCTTTG GTATGTTCCT TCGCCAGATA GTCAGCACGT TCCAGCACCT 4 380 

GCTGAAAGCC AGTGTCATCA CCGCGTTCCA GCCACACCGC CGGCGTGTCA GGAAAATGCG 4 4 40 

CCAACGTGGC ATAAGGCCCG GCATCCACCC CCAGGGCACT GCACCAGGCN TGWTTAATCA 4 500 

TCCCGGCCAG TGACCCCGGA TCGCG GTAAT CGCCGGCACG ACACCAGGTA TCCCGGTTGA 4 5 60 

CCAGCAGCAG GAGGTGATAG TGTTTTTTGC CCCTGAGTAC CCCGAACTCC CGGGCCCAGG 4 620 

CGTAATGCAG GGTGGTGGGA TGCACGCGTT TACCTTCACG NCGTTACGCT TCTGGTAAGC 4 680 

GTCGATTCGG GCTTTCAGGG CATTGATGAA GCGGGATATC ACAGCCGCGT CCGTAGCTGC 4740 

CGGTACATCC GGGAGACGCA GATCAACCCG AAGTGCCGTC AGGCGGGGAT GAACATTCAG 4 8 00 

TGCGTGCCGC ACCGTCTCAC GAATACGTTG CTGCCAGAAG GGGTTGTATT TGTAGGTCAT 4 8 60 

GGTTAAATCT CCGTATGGTT CATACGGAAT AGCCACGTCG TAAAAAATGC GCAGAGCCCC 4 920 

TGACGTGGCC ACCGACAGAA CACGGCCTCA GGCGCGTTGT GATAACCCAG CTATCGTTTC 4 980 

CGGACTGACG GTTGAATTTC CTGCGTTGTT TTCTTAATGT AAAAAACCTG CTACGGGTAA 504 0 

GGCTGTGAGG AGGAAGTGAT GGTGATACGC AAAAAGAAGT GCAGGGACTG CGGAGAAGCG 5100 

ACAGAGCATA ACACGGTATG TTGCCCACAC TGCGGTTCTG TCGATCCCTT CGGCTATTAC 5160 

CGCAATACAG ACAGAATATT CACCCTCCTG ATGGTCCTGC TGGTTGTGGT TCTGCTGATG 5220 

ACGGCTGCGG TCAGCGTGTA TGTGCTGTGG TAGTCGGAGG GGCAGGGAGC AGACGATGAC 5280 

GTAAAATATC TCCGGTGCTC AGATATCACG GCCGGTCAGA CCGCAAACCA ACGGTTAATC 534 0 

GTAACGGGAT CAGGCAAATG TGTGATTAGC CCCCTGGCGC TCATACCCGC ACCGCAGACC 54 00 

ACCTTAAGTA CTTCCCGCCC GACACCATTC CCTGCTCCCG GATAATTTGT TGTCGCTATA 54 60 

CCGCTTAACA TCACCGATAC CACACCGGCG CAGATAGGAC CGGATTCATT GTAGAGATGA 5520 

CTTAAGGTTC AGGTAACATA TTTCCAGACA GAAGCGGGAA CACGATCGTA AAGTTTGTTC 5580 

ATGGTCAGTT CTGCCAGCCG GTGATCAACC GCAGAGTTGA AATTTTCCAG CTCCGCCGGG 564 0 

GTGAGTTTAT ACCGTGCGTG GGAAATCACT TTTTCCAGTG TCTCCCGGGA TGAACAACGA 5700 

CGGAACTGAT ACAGCCAGTC TTCTTTGGTT TTTACTTCCA TTCGTCTCTC GTTACTTTAT 57 60 

GGTGCGGTTA ACAGGATGCC GTCAGTATAC CGCATGCAGA CACTGTCCCG CTCCCCCGCT 5820 

TGCTGCGATA CAACTTAACG TTTCAGGAAT CCAGTCATCG CACCGGGAAA GGCTTTCTGG 5880 

TGACAGGAAA CGTCAGGAAC AGGAGTTTCT CAGACTCCCA CTCATCGGAT CAGGCTCAGA 5 94 0 

CAGGATTATT AATACGCTCA GTTCATGTGT CATATACAGG GCATCGGGGA TGAATATATG 6000 

GGTATAACTC AGAGCCTGTA CTACAGCTTT CACTGCTGAC TGATTTTACG TATCAGCGTT 6060 

CATGTATCTG CACTCTGATA TAGAATACTT CTACCGGAGC TACTCTTACG TTAGCTCACT 6120 

CTCACATCAG GCAACATCAC TTATTCAGCT CACTTACCTC TTACCACTCA CTACTTCTTT 6180 
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ATATTTATAA TATCAATCAG ACAGCCTTAT CCCCCCGGTA ATATCTGTTG CCTTCCCGCC 624 0 

AGCCACAGGC TTATTCACCA CAACCACCTG CGATAACAAC TCTGCAATTA TCAGAACGCC 6300 

TGCTTCTCTC CCTGTCCTCA CGAAAACTAT CCCCTCTTTA TCGCGCGTGC GTGCGGAAGC 6360 

ATCTTTTCGC AACAACCACC CGGGATTCCG CTACGGCTCT GCCATCGCAA TCCCCCCGTT 64 20 

TATCTCCGGA CAGCCACATT CCCGATTATT TTTTACGTTT CTCCCCGGTT GTTATGCCGG 64 8 0 

TGAAGGTGGT GCGTCGTTTT CATCACCACA CCGGTTGCGA TTAACAACAT CCGGAGGAAC 654 0 

ATTCTCATGA CCACACCCTT TTCACTGATG GATGACCAGA TGGTCGACAT GGCGTTTATC 6 600 

ACTCAACTGA CCGGCCTGAG CGATAAGTGG TTTTACAAAC TCATCCAGGA CGGAGCCTTT 6660 

CCGGCCCCCA TCAAACTGGG CCGCAGCTCC CGCTGGCTGA AAAGTGAAGT GGAAGCCTGG 6720 

CTGCAGGCGC GTATTACACA GTCCCGTCCG TAATTTCTGC CCCTTATCCG TTCACCCGCA 67 8 0 

GCAGACGCCT CCCCGGCCTG CCGTTGACAT TCTGCTGCCT GTTTTATCCC CGTGAGGAAT 68 4 0 

ATGAAAATGA AACAACAGTA CCAGACCCGC TACGAATGGC TCCACGAAAG CTACCAGAAA 6900 

TGGCTGACCG GCTTCAMCCG GCACGCCGTA TCCTGGGGCG TGTGTCATCC GAATATCTAC 6960 

TATTTCCATA ATCTGACGCC CGGGTGGGTG TCATTCAACG GCGAACAGTC GGAGATTGCC 7 02 0 

ATTGTTCCCG GCAGTCTGCA CCGGCTGATT TATGGTCATG ACAAACGGGC CATGCCGCCC 7 08 0 

CTGGATGATG ATCTGGTGGT GAATTTATGC ACCAGTGAGA ATCTGCTGGT TCATCATCCG 714 0 

ATGCTGGAAG GCATTCTGCT GTCTGAGTGC ACGCGCCTGC ATAAAAAATC ACTGGCGAAC 7200 

AAACTGATCA GTATATTCCG TCAGTTTGAC GGGACGGAGC TGCGTCTCAA ACTGGTCTGG 7 2 60 

CTTTGCTGGT TTGATTTAAT GACCGGAAAC TGCCTTGACG ACTGGACGGA GAACCTGNAA 7 320 

CGGAAATCAG AAAAAGAGCT GGAGAAATGG ATCATTGAGC GCCAGAACCG GAACGCACCG 7 380 

CTGACGAATC TGATGGATCA GTACGTGCTC CTGGCATTCC GCACAACGGT TGACGATAGC 74 4 0 

CGCAACTGAT GTCTGCATGC TGCCSGCTGA AGCCATATTC ACGGGGCAGG GACGCCCCTG 7 5 00 

CTTCCGCAAC AATCCGGGGT AATGGCGACG TACGCCTGCA GAGTGTGTTC ATCGTTGTCA 7 5 60 

CAGCCGGACA AGGTGAATAC CGTTGATGAT GCGGGGATGA ACCTGCTGGT CCACCGCGCT 7 620 

GTCACTCAGA CGCGTCAGCG TGTATGGACG CCCCGATCGA ATGGTTCTTC CGCCAGAGTG 7 680 

CACAGAAATG AGGCACGGAA CGTTACCTGA AGGGTGACCG GCACGGACTG CAACTTGTTG 77 4 0 

CCATTGATGG CGCACAAGTC ACATACAGCA GAATGTCGTG ACCGCACCTT ACCGGTGAAG 7 8 00 

CGAAACGGTG CTGCCCCACT CCACCACCAT CCCGGATAAC GCCATTACGC TGTCTGATAA 7 8 60 

GCGCTTTTAC AGCGCAAATC TGGTGCAGAA AAGCGTAAAG CTGACCTGCC GGAGCAGGAT 7 92 0 

GTGGGCATGT TGCGGGCTTA CAACCTGATA CGGCATGAGG CACTAAAAGC AGCATCAGAA 7 98 0 

ATCAGCCTGA GTTCGCGTTC CGGTTTATCC CGACAGAGAG GACAGTGCCG GGCAACACGG 8040 

TGTCACCGGG GAGCATCCCG AAACGACCGG AGCATCTGCG GGATGCTCTG TAAGTGGTGT 8100 



NJSDOC1D- <WO 9822575A3 IA> 



WO 98/22575 



PCT/US97/21347 



-176- 

TAAGGTGGGC GGTTAAGGTA TCAAAAAAAT CGTTATCCTG T G AAA G A C A G TGCGCTCTGC 8 160 

TGAAGTGAAC GTCACTGCCG GGAAGCATCG GGTTTCGCTA CCGGACAGTC GCGGTAACGC 8220 

GTTTACCGGC ATCTGTCTGT GTGGCAGGGA TGGCTGATAT TGTCGGTTAT ACCAGCGGCA 8280 

GGTGCGTCCT GTTATCTGTA AAATCAGGGC GTGCCGGTAC ACAACGCCTC GTTGATGGCG 834 0 

GTCACTGAAC GAATCATCCT CTGACGAAAA GAACCGTCGA TACAACGCCG GGGTAAAAAG 8 4 00 

AAAACCGGAA ACCATCTTGT GCACGACAGG TACTCAGGGG GGTATAACGC CTGCGCACCA 8 4 60 

TCACATCCGG GAACAGGGCT GCTCGTCAGT GTCTTCGTGT GGCGAAGCAT CTGCAACCGG 8520 

ACGGTACTGC CCTCAGAGCA ATCTCCCTGC TGCAGTGCAC AGAGTAAGCC GGAAAGCTGG 8580 

TGAATGCCGC CATGACACAC TGCGACGTGG AGAAACAAAC GACAGACTCC GTCCGCAGTA 8 64 0 

ACACTGAAGG TAGTCCCGCA AACCTCAGAC TTCTTCCTGC ACGTTATCAG CGGACTGAAC 8700 

CCCGGTCAGC CACTTAAACC TGCTAATCGT GTTGCTGCAT ACCCGGCCGG CCGGAAGGTG 87 60 

TTATGAAGCC CGCGACCGGA GCGCTTCTGC AAATATCCGG GGAGATAAAA TTTTCGTGAC 8820 

AGGATGACGG TCGTGCTGCA GACGTAAAGC CGCAGGAGCG GACACGACAG ACAGTGTTCA 8880 

CTGTGGCGTC CTTTGCCGTC GGTATCGTGC TCACGCTGAG GTCCCGGGGG TACACCTGAC 8 94 0 

GACAAATACC TGCGATTCCC GGGACGGTCT GTTCTCCGTA AAATAAAGAA AATGCGGGAT 9000 

GCCTCCCGGA CTGCAGAGAA GAGGGATTGA CAGACAGTGT ATATTGCGTA CGATTACAGG 9060 

GGAAAAACAC AGTAAATATG GAGGTCAGGT CCGAAAACAA CCTACGAAAT TTCTATGAAA 9120 

AACGATTGAA AAAATCATCA AATTCAGTTC GTTTTTCTAT GGTAATTTTT AAACACTCCC 918 0 

GATGATAACC TGTTGTATGT GCATGTGGGG AACGCACCGA AAACATCAGA ATCATCTGAA 924 0 

AAAAACAACG AACACACCAG AAAAACAGGA GCAACCATAA CGAAGCAACA TATTGATTTT 9300 

AAACAGAATT TAAGGTTAAC AGACAAAAAA CACTTTCAAC TGAAGGAGAA ATATACACTG 9360 

GCGACAGTGC AGGGTTTTTC ATGCAAAAAA AATGAGCTTT TATCTCCGGC GCATACTGAC 94 20 

CGGGATGCAG CCATGACAGA GCAAAAACCA TTAAATATCA GGAGGTTAAA CACACAAAAA 94 80 

GCTGACATGC ATCAGGGAGC AATCCCTCAC AACAGAGGCT GAGCGGCAAC GCTTCCTCAC 954 0 

AGGACGGCAT TCCTGAAAGG ACAGGCAGCC ACGGCTTTTT ACTGCCCGTA TCCGGTATAT 9600 

TTATCTGCCG TGACGTGCAG AGGATTTTGT GTTTCCGGAA AT C AGGAAAA CAGGAGAACC 9660 

GCGGGAGATA TGATGGAAAA AGAACCGGAT GATATCTGCG CAGAGTGTCC GAATATTGAT 97 2 0 

GCAATAAAAC GGCACAAACA ACAGGCCGGA GCCATCAGGG AATACACTGA GTGGTTAAAA 97 8 0 

AAACAACCGC GTGCTTCTTA CTTTTTTCTC TTCCGGTTGT ACGCATACCT TCAGAATGAA 98 4 0 

GTGATATCCC GAAAACAAAA ACATTCGCTC ACCAGCGATA ACAGCCATCC CCCGGAATCT 9900 

GATGTCACCC CTCCGGATTT AACCCTTCCC CGTCGCTACT ACTGTGATTA CGGTTACACG 9960 

CCCTACCCCA TGATGGGCGG ACAGATGTCT GTTTTTGCCA CAACGTCAGA AACCACCAGT 10020 
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TCGACGAATG CAGTCCCCGG AAACGCAGTT ACCGGGAATG AGACTGAAAA GCATGAAAAC 10 080 

GCGGTACCGG CGACATTCCC CGTCAGCCGT TCTGCAATGC CCCCGGAACC TCTGCGGTTT 1014 0 

GCCACGGGTT TTCCATCGCA ACCACTGCTT GCCGGTCCCC GGGAAAAGCC GATGCGCACC 10200 

GTGCATCCTG ACATCCACAG CGAAATTATA TGGTTCTGCT CCACTTACCT GCTGAAATCC 102 60 

GGACCACAGA TTACGAAGAC GATTATCAAC TCAGTATTCT CTGAATGGGC CCGCATCAGC 10320 

AATGATTACC CCTCCCCCTT TTCGTGGGTG GACAGCAGGG ACAGTGAACA GTGTGACTGG 10380 

TTATGGAACG CCATGCAGCT CCGGTGTGTG GGAACCCCGC TGAATCCCCT TACCCCGGAG 10440 

CAGAAATACT GGTTTGCCTG CGCCACGTTT GATAACTGGG AGGGCTGGAA TGAGCAACAG 10500 

AT AC AG T T T T TACTGAAAAG TAATCCCAGA CGAAACAGAG CGAAGTTTAC GGTCACCTTC 105 60 

GGCCCTCCCT GGATTCAGCA TAAAGCCATT CTTCTTGATG AGCTGAAGAG TGCCCGGGAG 10 620 

CAACAAAAAA GGCGCGATGA ACGCGCTGAT GGTTCCGTCC CGCTG AAACT GTCCGGAAAA 10680 

ATCCACAAAC ACCTTGAAAG TATTGCCCGG AGTCGTGGTA TCCCCCCAAA AAAACTGCTG 10740 

AATGAAATGA TTGAGCAGGC GTACCAGGAC TCAGTGGTGA ACAGCCGGAA TAAACCACTG 10800 

ATTTAAAATA ATTTCAGACA GATATTATCT CCGTGAATCC CCCGCCACCT TTCCGGTGCG 108 60 

CGGGGTTTTG TCTTTTTTCA CCGGGAATAC ATGTATGAAT CCGTCTGATG CCATTGAGGC 10920 

AATTGAAAAA CCGCTCTCCT CCCTGCCTTA CTCGCTTTCC CGTCACATCC TGGAACATCT 10 980 

GCGCAAACTC ACCCGTCACG AACCCGTGAT TGGCATTATG GGTAAAAGCG GGGCCGGTAA 11040 

ATCCTCACTC TGTAATGCAC TGTTTCAGGG GGAGGTCACC CCGGTCAGTG ATGTTCACGC 11100 

CGGCACCCGG GAAGTGCGGC GCTTCCGTCT GAGTGGCCAT GGTCACAACA TGGTTATCAC 11160 

TGACCTGCCC GGGGTGGGCG AGAGCNGGGA CAGGGATGCA GAGTATGAAG CCCTGTACCG 11220 

TGACATTCTG CCTGAACTGG ACCTGGTACT GTGGCTGATT AAAGCCGATG ACCGTGCCCT 11280 

GTCTGTGGAT GAGTATTTCT GGCGACACAT CCTGCAACGC GGACATCAGC AGGTGCTGTT 11340 

TGTGGTGACG CAGGCCGACA AAACGGAGCC CTGCCATGAA TGGGATATGG CCGGCATTCA 11400 

GCCCTCTCCC GCACAGGCAC AGAACATTCG CGAAAAAACG GAGGCGGTAT TCCGTCTGTT 114 60 

CCGGCCTGTA CATCCGGTTG TGGCCGTATC GGCCCGCACC GGCTGGGAAC TGGATACGCT 11520 

GGTCAGTGCA CTCATGACAG CGCTTCCCGA CCATGCCGCC AGTCCCCTGA TGACCCGACT 11580 

GCAGGACGAG CTGCGCACGG AGTCTGTCCG CGCTCAGGCC CGTGAACAGT TTACCGGTGC 11640 

GGTGGACCGG ATATTTGACA CAGCGGAGAG CGTCTGTGTT GCCTCTGTTG TCCGTACGGC 11700 

CCTGCGCGCT GTTCGTGACA CCGTGGTCTC TGTTGCCCGC GCGGTATGGA ACTGGATCTT 117 60 

CTTCTGAACC TGTTGTGGAT GATGTCCTCC CTGCCTCTGA GTCTGCTCAC AAAAGCGCTG 11820 

TTTTCGTTAC TGTCTCTCTT GTCCGTGCAA TAGCTCAATA ATAGAATAAA GCGATCGATA 11880 

ACTATTTCAT CGATCGTTTA TATCGATCGA TATGCTAATA ATAACCTTTA TTACCAACAT 11940 
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GCGCAGATAC GCACAGACAG AC A T T C AG G G GACGACAGAA CAACACTTCA GAAACTCCCG 12 000 

TCAGCCGGAC CTCCGGCACT GTAACCCTTT ACCTGCCGGT ATCCACATCT GTGGATACCG 12060 

GCTTTTTTAT TCACCCTCAC TCTGATTAAG GAAATGCTGA TGAAACGACA TCTGAATACC 12120 

TGCTACAGGC TGGTATGGAA TCACATTACG GGCGCTTTCG TGGTTGCCTC CGAACTGGCC 12180 

CGCGCACGGG GTAAACGTGG CGGTGTGGCG GTTGCACTGT CTCTTGCCGC GGTCACGTCA 12240 

CTCCCGGTGC TGGCTGCTGA CATCGTTGTG CACCCGGGTG AAACAGTGAA TGGCGGAACA 12 300 

CTGGTAAACC ATGACAACCA GTTTGTATCC GGAACAGCTG ATGGCGTGAC TGTCAGTACC 12 360 

GGGCTTGAGC TGGGGCCGGA CAGTGACGAA AACACCGGCG GGCAATGGAT AAAAGCGGGT 12420 

GGCACAGGCA GAAACACCAC TGTCACCGCA AATGGTCGTC AGATTGTGCA GGCAGGAGGA 124 80 

ACTGCCAGTG ATACGGTTAT TCGTGATGGC GGAGGGCAGA GCCTTAACG G ACTGGCGGTG 12540 

AACACCACGC TGGATAACAG AGGTGAGCAG TGGGTACACG GGGGAGGGAA AGCAGACGGT 12 600 

ACAATTATTA ACCAGGATGG TTACCAGACC ATAAAACATG GCGGACTGGC AACCGGAACC 12 660 

ATCGTCAACA CCGGTGCAGA AGGTGGTCCG GAGTCTGAAA ATGTGTCCAG CGGTCAGATG 12720 

GTCGGAGGGA CGGCTGAATC CACCACCATC AACAAAAATG GCCGGCAGGT TATCTGGTCT 12780 

TCGGGGATGG CACGGGACAC CCTCATTTGC GCTGGTGGTG ACCAGACGGT ACACGGAGAG 1284 0 

GCACATAACA CCCGACTGGA GGGAGGTAAC CAGTATGTAC ACAACGGTGG CACGGCAACA 12 900 

GAGACGCTGA TAAACCGTGA TGGCTGGCAG GTGATTAAGG AAGGAGGAAC TGCCGCGCAT 12 960 

ACCACCATCA ACCAGAAAGG AAAGCTGCAG GTGAATGCCG GCGGTAAAGC GTCTGATGTC 13 020 

ACCCAGAACA CGGGCGGAGC ACTGGTTACC AGCACTGCTG CAACCGTCAC CGGCACAAAC 13080 

CGCCTGGGAG CATTCTCTGT TGTGGAGGGT AAAGCTGATA ATGTCGTACT GGAAAATGGC 1314 0 

GGCCGTCTGG ATGTGCTGAC CGGACACACA GCCACCAGAA CCCGTGTGGA TGATGGCGGA 13200 

ACGCTGGATG TCCGCAACGG TGGCACCGCC ACCACCGTAT CCATGGGGGA TGGCGGTATA 132 60 

CTGCTGGCCG ATTCCGGTGC CGCTGTCAGT GGTACCCGGA GCGACGGAAC GGCATTCCGT 13320 

ATCGGGGGCG GTCAGGCGGA TGCCCTGATG CTGGGAAAAG GCAGTTCATT CACGCTGAAC 13380 

GCCGGTGATA CGGCCACGGA TACCACGGTA AATGGCGGAC TGTTCACCGC CAGAGGGGGC 13440 

ACGCTGGCGG GCACCACCAC ACTGAATAAC GGTGCCACGC TTACCCTTTC CGGGAAAACG 13500 

GTGAATAACG ATACCCTGAC CATCCGTGAA GGTGATGCAC TCCTGCAGGG AGGCGCTCTT 13560 

ACCGGTAACG GCAGGGTGGA AAAATCAGGA AGTGGCACAC TCACTGTCAG CAACACCACA 13 620 

CTCACCCAGA AAACCGTCAA CCTGAATGAA GGCACGCTGA CGCTGAACGA CAGTACCGTC 13 680 

ACCACGGATA TCATCGCTCA TCGCGGCACG GCCCTGAAGC TGACCGGCAG CACCGTGCTG 1374 0 

AACGGTGCCA TTGACCCCAC GAATGTCACC CTCGCCTCCG GTGCCATCTG GAATATCCCC 13800* 

GATAACGCCC CGGTTCAGTC AGTAGTGGAT GACCTCAGCC ATGCCGGACA GATTCATTTC 13860 
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ACCTCCGCCC GCACAGGGAA GTTCGTACCG GCAACTCTGC AGGTGAAAAA CCTGAACGGA 1392 0 

CAGAATGGCA CCATCAGCCT GCGTGTACGC CCGGATATGG CGCAGAACAA TGCTGACAGA 13 980 

CTGGTCATTG ACGGTGGCAG GGCAACCGGA AAAACCATCC TGAATCTGGT GAACGCCGGC 14 040 

AACAGTGCGT CGGGGCTGGC GACCACCGGT AAGGGGATTC AGGTGGTTGA AGCCATTAAC 14 100 

GGTGCCACCA CGGAGGAAGG GGCCTTTGTC CAGGGGAATA T 3GTGCAGGC CGGGGCCTTT 14 160 

AACTACACCC TCAACCGGGA CAGTGATGAG AGCTGGTATC TGCGCAGTGA AGAACGTTAT 14 220 

CGTGCTGAAG TCCCCCTGTA TGCCTCCATG CTGACACAGG CAATGGACTA TGACCGGATT 14 280 

CTGGCAGGCT CCCGCAGCCA TCAGACCGGT GTAAGCGGTG AAAATAACAG CGTCCGTCTC 14 340 

AGCATTCAGG GCGGTCATCT CGGGCACGAT AACAACGGTG GTATTGCCCG TGGGGCCACG 14 400 

CCGGAAAGCA GCGGCAGCTA TGGCTTCGTC CGTCTGGAGG GTGACCTGCT CAGAACAGAG 144 60 

GTTGCCGGTA TGTCTGTGAC CGCGGGGGTA TATGGTGCTG CTGGCCATTC TTCCGTTGAT 14 520 

GTTAAGGATT ATGACGGTTC CCGCGCCGGC ACGGTCCGGG ATGATGCCGG CAGCCTGGGC 14 580 

GGATACCTGA ATCTGGTACA CACCTCCTCC GGCCTGTGGG CTGACATTGT GGCACAGGGA 14 640 

ACCCGCCACA GTATGAAAGC GTCATCGGAC AATAACGACT TCCGCGCACG GGGCCGGGGC 14 700 

TGGCTGGGCT CACTGGAAAC CGGTCTGCCC TTCAGTATCA CTGACAATCT GATGCTGGAG 14 760 

CCACGACTGC AGTACACCTG GCAGGGGCTC TCCCTGGATG ACGGTAAGGA CAACGCCGGT 14 820 

TATGTGAAGT TCGGGCATGG CAGTGCACAA CATGTGCGTG CCGGTTTCCG TCTGGGCAGC 14880 

CACAACGATA TGACCTTTGG TGAAGGCACC TCATCCCGTG ACACCCTGCG TGACAGTGCA 14 940 

AAACACAGTG TGCGTGAACT GCCGGTGAAC GGGTGGGTAC AGCCTTCTGT TATCCGCACC 15000 

TTCAGCTCCC GGGGAGACAT GAGCATGGGT ACAGCCGCAG CCGGCAGTAA CATGACGTTC 15060 

TCACCGTCCC GGAATGGCAC GTCACTGGAG CTGCAGGCCG GACTGGAAGC CCGTGTCCGG 1512 0 

GAAAATATCA CCCTGGGCGT TCAGGCCGGT TATGGCCACA GCGTCAGCGG CAGCAGCGCT 15180 
GAAGGTTATA ACGGCCAAGC CACACTGAAT GTGACCTTCT GATAATTCGG CATTGTCTCT 1524 0 

CTGTGGTCCC GGTCATCATG ACCGGGACCC GGACAGGTGC AAACGCTTCA GTGCCACATT 15300 
CACTGGCATT CACAATAACA TGATATTCAT CACGGAGTGA CTATGTTACA GATAGTCGGT 15 360 

GCGCTGATTC TGCTGATCGC AGGATTTGCC ATTCTTCGCC TTTTGTTCAG AGCATTAACC 15420 
AGCACAGCGT CTGCGCTGGC AGGGTTCATA TTGCTGTGTC TGTTCGGCCC GGCTTTACTG 15480 
GCTGGCTATA TCACTGAACG CATAACCCGG TTATTCCATA TTCGCTGGCT GGCAGGCGTA 15540 
TTTCTGACGA TTGCCGGAAT GGTCATCAGC TTCATGTGGG GAGTTGATGG TAAACATATC 15 600 

GCACTGGAGG CTCATACCTT TGACTCTGTA AAATTTATTC TGACCACCGC TCTCGCCGCT 15660 
GGTCTGCTGG CTCTTCCCGT GCAGATAAGA ACCATTCAGC AGAACGGGCT CACACCAGAA 157 2 0' 

GATATCAGCA AGGAAATTAA CGGGTATTAC TGCTGTTTTT ATACTGCTTT TTTCCTTATG 15780 
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GCGTGTTCTG CATACGCACC ATT GATCGGA TTGCAGTTCG ATATTTCACG CTCACTGATG 1564 C 

TGGTGGGGCG GGTTGTTGTA CTGGGTGGCT GCATTAGTGA CGCTGCTATG GGCGGGGAGC 15900 

CAGATCCAGG CGCTGAAAAA ACTGACCAGT GCCATCAGCG AGACACTGGA AGAACAACCG 15960 

GTGCTCAACA GTAAATCGTG GCTGACCAGT TTGCAAAACG ATTACAGCCT TCCTGACTCA 16020 

CTGAGGGAGG GCATCTGGCT CACGGTCATT TCACAACGGA TTTCGGGGGG AGAAGTGAGG 16080 

GAATTTGAAC TGGCAGAGGG AAACTGGCTA CTGGACAATG CCTGGTATGA AAGAAACATG 1614 0 

GCGGGTTTCA ACGAAAAGCT GAGAGAGAGC CTGTCATTTA CCCCTGATGA ACTGAAAACC 16200 

CTCTTCCGGA ACCGCCTGAA TTTATCACCG GAAGCGAATG ACGATTTTCT CGATGGTTGG 16260 

CTGGACGGGG GTGACTGGTA CCCCTTTTCA GAAGGCCGCG GTTTTGTATC ATTCGACCAC 1632 0 

GTGGATGAGC TTCGTATGTG TGCGTGCTGG GGGCTGACAG AAGTACATCA TGCCGCGGAA 1638 0 

AATCATAAGC CGGATCCGGA ATGGTACTGG TCCTCTCTTT GTCGCGAAAC AGAAACACTG 16440 

TGTGAGGACA TTTATGAACG TTCTTACACC GGTTTTATTT CCGATGCAAC GGCGAATGGT 16500 

CTGATTCTCA TGAAACTGGC GGAAACCTGG AGTACAAATG AGAAAATGTT TGCTTCCGGA 16560 

GGGCAGGGAC ATGGGTTTGC CGGTGAACGG GGAAACCATA TTGTCGACAG AGTCCGTCTG 16620 

AAAAACGCAC GGATCCTCGG TGATAATAAT GCCAAAAATG GAGCAGACAG ACTGGTCAGC 16680 

GGAACAGAAA TCCAGACGAA ATATTGTTCA ACTGCAGCCC GTAGCGTCGG TGCGGCATTC 16740 

GACGGACAGA ACGGACAGTA TCGTTACATG GGAAATCATG GTCGCATGCA ACTGGAAGTC 16800 

CCCGTGATGA GTATGCCGGC GCTGTGGAAA CCATGAAGAA TAAGATCCGC GAAGGTAAAG 16860 

TACCCGGTGT AACCGATCCC GAAGAAGCGT CCCGGCTGAT TCGTCGGGGA CATCTGAGTT 16 920 

ATACCCAGGC CCGTAATATC ACCCGGTTCG GGACCATCGA ATGGGTCACT TATGATATTG 16 980 

CCGAGGGGTC GGTTGTCAGT CTGGGGGCGG GAGGGATCAG TTTTGGCCTG ACGGCATCGG 17040 

TCTTCTGGCT CAGCACCGGC GATGGCGATG CTGCCCTGCA GACAGCTGCT GTCCAGGCAG 17100 

GAAAAACCTT CACCCGCACA CTGGCTGTCT ACGTCACAAC CCAGCAACTT CACCGGCTCA 17160 

GTGTTGTTCA GGGTATGCTG AAGGATATTG ATTTTTCGAC GGCCAGCCCG ACTGTCCGGC 17 220 

AGGCGCTTCA GAAGGGGACC GGTGGAGGAA ATATCAGTGC CCTGAACAAA GTGATGAAGG 17 280 

GGTCGCTGGT GACATCTCTG GCACTGGTAG CTGTCACAAC CGGCCCTGAC ATGATCAAAA 17 340 

TGTTGCGGGG ACGGATCTCC GGTGCGCAGT TCATCAGGAA TCTTGCCGTG GCATCTTCCT 17400 

GTGTGGCAGG TGGTGCTGTC GGGTCAGTGG CGGGCGGGAT ATTGTTCAGT CCACTGGGAG 17460 

CATTTGGTGC ACTGACAGGG CGTGTGGTTG GCGGTGTTCT GGGGGGAATG ATTGCCTCCG 17 520 

CTGTATCAGG AAAAATTGCC GGAGCGCTGG TTGAAGAAGA TCGCGTCAAA ATTCTGGCAA 17580 

TGATTGAGGA GCAGGTGACA TGGGTTGCCG GCAGTTTCCT GCTGACCGGA CATGAGATTG 17 640 

AAAATCTGAA CGCGAATCTG GCCCGTGTTA TCGATCAGAA TGCTNCTGGA GATGATTTT 2 17700 
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GCCGCCGGTA 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1803 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
AATAACCAAT AGATGCTTAA GTTTACGATA TGCCTCAACC CGCGTCTGCT CTAAGCTGAT 60 

AAGGCCAGTT TTGTAGAGAT CCGCTGCCAA GGTTGCCTGC GTTTGCACAT CCATGTAACC 120 

GGCGGTGATT TCATTCATGG CATCGTTATC TTGACCAGTC AGCTTAGCAC GCTCCTGTTC 180 

AAGCTGCTTG GTTAGGGCGT CAACTCGGCT CTGTAATGAG ACTACGGCCG GTGCGGTTTC 24 0 

CTTCATATAG CTGCGCAGTT GTTTTAGCTC CGCCTGTTGA CGCACCAGCT CTCCTTCAAT 300 

CTGGCTGACC ACTCCCAAGC GTGCGCTGCT GGTAGATTCA GGGCTGAGAA GTTGGTGGCT 360 

ATTCTGAAAT GCTAATACTT TAGCTTTTTC ATCCTGTAAG CGTTGATATG CTCTATTTAC 420 

TTCTTTTTCA ACAAAGGCCA ATTGTTCGAG CGCAACCTGA TGACCTAATT TGTTAATAAA 4 80 

ACGCTCCGAT TCTTTGAGCA TTAACTCAAC AACTCGCTGA CCGTATTGGG GATCAAATGT 54 0 

CTGCAACTCA ACGGTAAGTA CTCCTGATAA TTCATCAAGG TGTAACGTCA AATGTTTGCG 600 

GTAATAATCA AGAAAATCTT CCCTACTGAC TCCCTTATGC AACCGCGAGA AATAATCTGC 660 

ACTATCACTC TGGAAATGTG CTTTAAGTGC AAGTTCTTTG TCCAACTTGG CCAGCATATC 720 

CCATGACTTC ATATAATCCT GAACGAGTAA TATATCCTGA TGATTACTAC CACCTATCCC 7 80 
TAACATTGAT AACGCATCAG GCAACATTTT AACTTGATCG GCTTGTTTAA TCATTAATTC 840 
AGCCCGGSTC ACATAACGAT CGGAAGCAAT GAAGCCAAAA TAGAGCACTG CGATAGAAAA 900 
GCAGATAACT ACCCAAAGAA AACTGCCTAG CTGTAAACTT TTCTTCCACG AGCGGTGTAC 960 

AATTTGATAT CCTCTCGAAT CAATCAAAAA TAGTTTTGGA TTATTGCTCA GTTTTCTTAA 1020 

CTTTCGCGTA AGGCGAGATA TTGAGGATGA AGAATTCGGA GATGTCATAA TCAGTTGCTG 1080 

CTCAAAGTGA CTGGTAAATT TTGATGGCAT CATCAATATT ATCAAAAACT TCTAATTTAC 114 0 

CATCACGTAA CAAGATGCCC ATATCGCATT GTTGTCGTAG ATTTTTCATA TCATGCGAAA 1200 

CCATAATCAA ACTAGCTGTT TCTCGCTTTT TGTTAAATAC ATCAATACAT TTTTGTTTAA 12 60 

AACGTGCATC ACCTACTGAG GTAATTTCAT CGGTAAGATA TATATCAAAA TCAAAAGCCA 132 0 

TACTAACAGC AAAAGAAAAT TTTGATTTCA TGCCGCTAGA GTATGTTTTA ATAGGCAGCT 1380 

CATAATGTTG TCCAATTTCA GAAAACTCTT TAACCCACTC TTCTACGGGG CTTGTATCGC 14 4 0 

GTACACCATG AATGCGGCAA ACAAATCGCG TGTTTTCACG ACCAGTCATA CTACCTTGAA 1500 
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ATCCCCCA; 



TAGTGCTAGA GGCCAAGATA CT 



CGGCAGAG AC GAG T T AC T TTCCCCCTGT 



1560 



TAGGCGTATC CATCCCTCCT AACAAACGTA ACAAAGTAGA TTT YCCKGCT CCATKGATAC 



1620 



CTAGAATACC TATATTACGG TCCCTTGGTA GCTCAATATT TACATTCCTC AGGACATAAT 



1680 



TTCGTCCAAA TTTAGTTGGA TAATATTTTG ATACATTATC AAGAATAATC ATTTTTCTTA 



1740 



ACGCTAACTA GCAATCAATT GGCGAT 3CCG TAATCGGTAA CAACTCATAG CAAAAGTGAG 



1800 



CAA 



1803 



(2) INFORMATION FOR SEQ I D NO : 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1283 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 

NGGACCCAAG GTAAAAACNG GTAAAAAAAA CMATTGACCG ATTAAACTTT ATTTCTCTGC 60 

CCGCATTAGT CTGGAGAGAG GATGGATGTC ATTTTAATTT NACTAAAGTC AGTAAAGAAG 120 

CAAACAGATA TCTTATTTTT GATCTGGAGC AGCGAAATCC CCGTGTTCTC GAACAGTCTG 180 

AGTTTGAGGC GTTATATCAG GGGCATATTA TTCTTATTGC TTCCCGTTCT TCTGTTACCG 24 0 

GGAAACTGGC AAAATTTGAC TTTACCTGGT TTATTCCTGC CATTATAAAA TACAGGAAAA 300 

TATTTATTGA AACCCTTGTT GTATCTGTTT TTTTACAATT ATTTGCATTA ATAACCCCCC 3 60 

TTTTTTTTCA GGTGGTTATG GACAAAGTAT TAGTACACAG GGGGTTTTCA ACCCTTAATG 4 20 

TTATTACTGT CGCATTATCT GTTGTGGTGG TGTTTGAGAT TATACTCAGC GGTTTAAGAA 4 80 

CTTACATTTT TGCACATAGT ACAAGTCGGA TTGATGTTGA GTTGGGTGCC AAACTCTTCC 54 0 

GGCATTTACT GGCGCTACCG ATCTCTTATT TTGAGAGTCG TCGTGTTGGT GATACTGTTG 600 

CCAGGGTAAG AGAATTAGAC CAGATCCGTA ATTTCCTGAC AGGACAGGCA TTAACATCTG 6 60 

TTCTGGACTT ATTATTTTCA TTCATATTTT TTGCGGTAAT GTGGTATTAC AGCCCAAAGC 720 

TTACTCTGGT GATCTTATTT TCGCTGCCCT GTTATGCTGC ATGGTCTGTT TTTATTAGCC 7 80 

CCATTTTGCG ACGTCGCCTT GATGATAAGT TTTCACGGAA TGCGGATAAT CAATCTTTCC 84 0 

TGGTGGAATC AGTCACGGCG ATTAACACTA TAAAAGCTAT GGCAGTCTCA CCTCAGATGA 900 

CGAACATATG GGACAAACAA TTGGCAGGAT ATGTTGCTGC AGGCTTTAAA GTGACAGTAT 960 

TAGCCACCAT TGGTCAACAA GGAATACAGT TAATACAAAA GACTGTTATG ATCATCAACC 1020 

TGTGGGTTGG GGTGCACACC TGGTTATTTC CGGGGATTTA AGTATTGGTC AGTTAATTGC 1080 

TTTTAATATG CTTGCAGGTC AGATTGTTGC ACCGGTTATT CGCCTTGCAC AAATCTGGCA 114 0 

GGATTTCCAG CAGGTTGGTA TATCAGTTAC CCGCCTTGGT GATGTGCTTA ACTCTCCAAC 1200 



9822575A3 !A> 



WO 98/22575 



PCT/US97/21347 



-183- 

TGAARTTCAT CATGGGAAAC TGGSATTACC GGRAATTAAW GGTGATATCA CTTTTCGTAA 
TATCCGGTTT CGCTATAAGC CTG 
(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6836 base pairs 

(B) TYPE: nuclea c acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 

TCAACCTGAC CAACCACTAG AATCAACTCA CGTCCGTCGT TAGGGGGCTC ATATTCTTGT 60 

GTACTCCCCA CATTGTATTT ACTGACTCGT GATGATTGTA ATTGCGCTAA TAATGACTCT 12 0 

GCGCGTGCTT CTTCTTTCGC ATCTAAAACG TACGTAGTGA GTAACTGCTC AAGCTTACTC 180 

GGACGGCGGC TATCAAAATA GATTCCAACG GGGTCAATCG AGAGTGATGA AGGTCGACAT 24 0 

AAATTAGACC CCAATCCGTT GGAGCGGATA AAACCATCTT CAATCCGGAT CACTGATTGC 30 0 

AGTTCAGGAT AACGGTTTCC CCACACCAAC ACCTGTTCAT CATCTTTTAA CTGTGAGGGC 3 60 

ACAGTACGAA CAAAACAAAG TTCATCTGCC AAATACGCAC AAAATGTGCG TATAAAAGCA 420 

CGCTTCCACA GAGAAAAACC AACGAGATAA AGACGACGCC AAGGTTTGGG CTCTACCTGC 4 80 

TGCTGAGCCA AAATCGCTAC AACATCTTCT ACCTCACAAC GTTTTCCCAA TATAGGATCT 54 0 

AAATAACGCG GATAACGGAT CAACGCCGCC GCAACTAAGC GGGGCAATGA AATAGATGAA 600 

ACGCCTTCGG CTGACATTGC TTCTTCACGG CGTATACAAC GTTTACTGTC ATGCGTTAAC 660 

CCCCACCCAG CATAAAATGG CATACCGAAG CAATATACAG GTTTGCCCAA CAGCAACGCT 72 0 

TCCAAAGCCA ACCTGCGATG AAACTGTGTA CACCGCATCC ACCATACGAA TTATTCTATG 7 80 

CGGATGGCAA GTTCACTCAC CACCTCAACA TCAGCCAGTC GAGGATCACG CCCCACTAAA 84 0 

CGTGCTAACA CGCCGCTTTT TTTGCTAAAG CGTGTATCTG GGTGTGTTCG CAACAATAGA 900 

CGCGCATTAG GGTGATTACG GCGAGCCTCG ACCACCATAG AAACAAAATC AGCTTCGCAA 960 

GCAAGAGCCC CAGAAATTGA CAAGTCTCCC GCTACTTGAT CCACAAGCAA AATACGCGGT 1020 

CTTGGATCAT CCAGTAAACG TGCTAAGTTT GAATGAGCCG TGAGGTGAAT AACTCAGGTT 1080 

GTATATGTGT CGGTAAATCT AAAGAAGGCC CGTCAGTAGC ACGGGACAGA GCC AT TAAAT 1140 

GTATGCTCAG TGCTATTGGG TATAGCAGTT ATACTTGGTG ATTCCTAAAC GCAAAATATC 12 00 

MGAGATCAGA TGCTCCAGCG CGCGCAAAGT AAAGCCGTAT CCAACAGGTT CCAATAATAA 12 60 

GCTGTTCTAA TTGACTCGTC TGATGTGCAT CATAATATAT CCCCAGAGGG TCAGCAATAA 1320 

GAGAAACCGC CTTTCCTCCT TTTGCTGGGT GCCCGATATA GCCAATAAAA CCATCTTCAA 1380 

GTTGCCAATA AGATATTCCT AACTCTTGAG CTTTCTGTTT AATCTGCTTA GTATTAGATT 14 4 0 
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TTTTTCCCCA G C C AA C T AAA ACGTCATTTT TAGAAAAAGC 
GCAATGGGTG ACCAAGCATA GGCTCAATAT TATTTTYTCT 
CCGTATATAA ATACATGTTG TCTCTGTGAA CTGAAGATTC 
GTGATTTAGA TGAACAGCTC TGCGCTCTCT AATGACTTTG 
GTGAGAATGT CCGCCTTTAA CTCGGGCCAC CTAATACCAA 
ATGCCTCTAT CACTGGCAGG GGCATAATAA TTAGTTGTTT 
TCAGTCAGTG TTACAAAACC ATGGGCAAAT CCTTCCGGAA 
CCCCTGAAAG ATGAACGCCA ACCCATTGTC CGRAGCTCGG 
CCGCAACATC AAACACTTCA CCGGCTACAC AACGCACTAA 
GTAACTGATA GTGCAAGCCA CGCAGTACCC CTTTAGAAGA 
CAAAGGTAAC TGGATATCCT ACAGCCTCTT CAAACAACTT 
AAAAACCACG CTCATCTCCA AATACTTTTG GCTCAAAAAT 
TCTTGATTAC ATTCATCTAT ATGCCCACAT TTAATTAAAT 
CCCTCCCCCT TCTCAATTAC ATCACGCCTT ATCAATCATT 
GGCGTTTTTT GCCAACGGAG CAGCAAGYTC ACGAACCTGG 
GCGATAAGCA ATCTCTTCCG GACAAGCCAC TTTCAATCCC 
AATAAAGTTA CTCGCTTCAA TTAGGCTTTC GTGGGTACCG 
ACGCCCCATC ATTGCCACCG ATAGATTGCC TTGCTCCAGG 
GATTTCCAAC TCACCACGCG GCGATGGCTT GAGACCCTTG 
GTCGTAGAAA TAGAGGCCGG TGACTGCGTA STACTCTTAG 
AGTGAAATAG CGGTACCTTG ATTATCAAAT TCGACCACTC 
ACATGATAGG CAAATACAGT AGCACCGGTC TCTTTGGCCG 
TGTAGGTCAT GAGCGTAGAA GATGTTATCC CCCAGCACCA 
ATGAATTCTT CACCTAGAAT AAAAGCTTGT GCCAACCCGT 
TATTGTAAAT TCAGTCCCCA GTGGCTGCCA TCACCCAGCA 
TCTTGTGGAG TGCTAATGAT CAAAATATCG CGAATTCCAG 
GGCCGCAGTA CTGGATCATC GGCTTGTCAT AGATGGGCAA 
TAGTAACCGG ATAGAGACGT GTACCAGATC CACCGGCCAG 
TCATGATGCT TGTTTCTTAT TTTTAAATTA CATAAGAATA 
TTCTGTTTTA TCCTCACCTG TGGTTTACTT CCCCATGATC 
ACCGACTGAC CAGTCCGGCA AAACCAGATC AAATGTACGC 
TCGGGAATTA TGAGGGCGTT TCGCCGGGGT CGGAAAGGCG 
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CTCGTCTCCT TTCATATAAA 1500 

GGCAAGAATC CCTTTCGATC 15 60 

TCTACAATGG TGTATAAAGT 1620 

CAATACTATC TTTTGCTGAA 1680 

TTGTAGGATC ATTCCATGCA 17 4 0 

TATACAAAAA TTCGGCCGAT 18 00 

TGCATAATGT CGTTTGTTTT 18 60 

TGAGCTTTTG CGAATATCTA 1920 

CTTGCCCTGG GCATGGGGAG 1980 

TTTTGAGTGA TTATCCTGCA 2 04 0 

GTGATTAAAA CTCTCAAAGA 2100 

AAGCACACCA GGAATTGCTG 2160 

ATTTTTAGGG GAAGCATATT 2220 

TTTAATAAAT ATTGCCCATA 2280 

TCGGCACTAA TAAACTTCTG 2 34 0 

TGACGCGTCT CGATGGTCTG 2 4 00 

GTATCAAGCC AGGCATAACC 2 4 60 

TAAATACGGT TCACATCGGT 2 520 

GCAACGTCCA CAACGCTGTT 2 5 80 

GCTCCAGTGG TTTTTCTTCC 2 64 0 

CATAACGTTC CGGGTCGTGC 2700 

CGGCTGCCTC CAACTGTTTC 27 60 

GTGCACACGG GGCTGAACCA 2 820 

CTGGGCTTGG CTGAACCTCA 2880 

ATCGCTGAAA GGANGGAGTA 2 94 0 

CCAGCATCAG GGTGCTCAGC 3000 

CAACTGCTTG CTCACCGCCA 30 60 

AATAATACCT TTACGTTTAG 3120 

AAGTGGCTTG AGCCGCGCCT 3180 

TCAGTCAACA TCCGCTCAAC 32 4 0 

TGGAATTTTT TAGTATCAAG 3300 

CCTGTCGGCA CTGCATTAAG 3360 
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CTGTGTGACT 


GCCAGTTCAA 


CTCCTGCGTC 


TCTGGCTTTG 


TCAAACACCA 


ACCGGGCGTA 


3420 


GTCAAACCAA 


GTGGTAGTAC 


CGGAGGCAGC 


CAAATGGTAC 


AGCCCGGCAA 


CGTCGGGTTT 


3480 


GCTCTGTGCA 


ACTCGGATTG 


CATGGGCGGT 


ACAATCGGCC 


AGCAACTCAG 


CTCCAGTTGG 


3540 


AGCGCCAAAC 


TGATCATTAA 


TGACCGATAT 


CTCGCGACGC 


TCTTTGCCAA 


GACGCAGCAT 


3600 


AGTTTTGGCG 


AAGTTGGCAC 


CGCGCGCAGC 


ATAAACCCAA 


CTGGTACGAA 


AGATAAGGTG 


3660 


ACGTGAGCAG 


AGTGCCGCAC 


CGTGTTCCCC 


TGCCAGCTTG 


GTTTCGCCAT 


AGACGTTGAG 


3720 


CGGGGAAATC 


ACATCGGTTT 


CCACCCAAGG 


ACGTTCACCA 


CTTCCATCGA 


AAACATAGTC 


3780 


GGTGGAATAA 


TGTACTAGCC 


ACGCACCTAA 


TGCTTCAGCT 


TCTTTGGCAA 


TAACCGCCAC 


3840 


ACTAGTTGCA 


TTGAGTAACT 


CGGCAAATTC 


CCGCTCACTC 


TCCGCTTTGT 


CGACTGCAGT 


3900 


ATGGGCCGCT 


GCGTTAACAA 


TCACATCCGG 


CTTGACGAGA 


GGTACCGTTT 


CAGCCACCCC 


3960 


TGCAGAATTG 


CTAAAATCAC 


CGCAATAGTC 


GGTGGAGTCA 


AAATCAACGG 


CAGTGATGTG 


4020 


CCCCAGAGGC 


GCCAATGCAC 


GCTGCAGGCC 


CCATCCACTT 


TCTGGCCACA 


CCAGACTCGC 


4080 


CAGCAAAAAA 


GTGAGTGCTG 


TCAATAACTC 


AACCAGCGGA 


TAACGCTTGC 


TGATTTTCGC 


4140 


CTGACAGTCG 


CGGCAGCGCC 


CTTTGAGCAT 


CAACGATGAG 


AGCAGCGGAA 


TATTGTCACG 


4200 


AACGCGGATG 


GTCTGCTGGC 


AATGCGGACA 


GTGCGAACGC 


GGTAGCGCAA 


GGCTTATTTT 


4260 


TGACTGCGCA 


CTCGGCATTT 


CACCATGAAA 


CTCCGCCATT 


TGTTGGCGCA 


GCATGATGGG 


4320 


GTAACGCCAA 


ATCACCACAT 


TCAAAAAACT 


GCCGATGATC 


AATCCTCCGA 


CGGTTGCCAG 


4380 


TATGGGCATC 


GCCGCGGGGT 


ATTGCTGAAA 


AACATCAAAA 


AGCATGGTTA 


AAGGTTATTT 


4440 


GTTGTAACTT 


GCCGGATGCG 


GGCCTGCGGG 


TGTATGCCAT 


ACGGCTTTCC 


TTCAGGCCCG 


4500 


ATGCGCCTTA 


TTTCATGCCG 


GATGCGGCGC 


GAGCGCCTTA 


TCCGGCATAC 


AGGCTTACTC 


4560 


AGCTGACATC 


TTATGCTCGG 


TAACCTGATT 


AATGGTTTCC 


GGCCCTTGCT 


GCGGTTTCGG 


4620 


CAGATTAAGC 


GCCGCCAGTG 


TCTCGTAAGC 


CGACTGGCTC 


ACACCGCCCT 


CGAAGTTCAT 


4680 


CTCGCTCGCT 


CCCGGCAACT 


GGTAAGCATT 


CGCGCCCGGA 


TTCCATTTCT 


TAAAGAACTC 


4740 


CGAAAGATCC 


GTCTGGGCGA 


CCCAGGATGC 


ACACAGCATC 


AGCTTGTCGG 


CAGCGTTACC 


4800 


GTTGGATTCG 


GCACAGTAAT 


TTCTTTCGCC 


AAACTTGGTT 


TTGCCAACCT 


CATCGCCGCG 


4860 


TGCTTTACGG 


TGCATCAACT 


GGAACAGGTT 


CCAGCCTTTC 


ATCCCTTCAC 


GATCGCTGTA 


4 920 


GAACTTAGGC 


AGGTCACGTT 


CTGGATACCA 


CTGTTTGATA 


TCAAAGTTTT 


TCTCTGCCCA 


4980 


CTCTTTCAGC 


TGTGCGTACA 


TCAGCAGACG 


GTCACCCGCA 


CCGCCGCGCG 


CCCATGCCTG 


5040 


ACCGTTGCTC 


TCCTCGAGAT 


ATTCCGGCGC 


GACGGTAATG 


TCGTCAGGGA 


CACGGTTCAT 


5100 


CTTGCCGAGA 


TAGCGATCCT 


GC AT 3TACAG 


CGCCAGCACG 


TTGTTCGCTA 


CTTCAGTTGC 


5160 


GCCAGGAACA 


GTCAGCGGCG 


TTTCGGCGGC 


GTTGTGACCA 


ACTTCGTGCC 


AGATCAGCCA 


5220 


GTCGTTCAGC 


GGCGTCGTCG 


GCAGCGTGGT 


GCTGTTCGTC 


GAGAAGCTGC 


TGTTCATTAC 


5280 



WO 98/22575 



PCT/US97/21347 



-186- 

CGGATAACCA GAGTGCGCAT CACCGATGGA GATCTGCACA TCGTTGGTGA AACGATGCTT 5 34 0 

GTGGCCCGTC AAGTTTTTAT AGGTAAACAT CCGGTGCTTA CCGTCTTCAT CATTACGACC 54 00 

GTAGAAGTCA TTCATCGAGC TGGCAAAGGT ATCCAGATCT TTAGCGAATT CTGCTACGCC 54 60 

ACCAGTGAAA TTGCTGGCCT CAAGGTTCTT CTTCGGCGTG GTGTAGACGA AAGCGTCTGA 5520 

CTCCAGCTCG CCCAACGGCG CAGGGGA3TT CAGAGCGTTT TTCCATGCGC CATGTTTATA 558 0 

GAACGGCGCT TTCACCACAC CAGTAAAGGT GAATTCGGCT GATTCATTCT GTGGGCTGTT 5 64 0 

GCCCTTGATA TAAATCAGAC CACCGTAAGG AACCGTAAAC TTCACCTCAC CATTGGCTTT 5700 

CAGCTCATAG GTTTTCGTCA CTTTTGGGGG ACGGTTCAGA GCGACTTCAT GCTTCTCACG 57 60 

TCCGGTAAGG TCGTCGGCCA GCGCCACGGT GACAGTCACA GGAACTGATG CAGAAGACTC 5820 

AATGGTGACC TCTTTCTGAG CCGGAGCCCA CAGGCCAGTA GACTGCATGT TACCCGCAAA 58 80 

CCATTTGGTC GGATTCGAGT ACAGGCTGAT GGTTTCAGTA ACCTTCTCAC CTTCTGCCGA 5 94 0 

TACCGCTCCC GGATACTTGT CGACATCAAC TTTGATGTTC AGATCCCACC AGGAACGACC 6000 

CAGCATCAGG CGCGTCAGCG GTTTTTCCAT ATAGTTGAGC GGATAGCTCG GGTTCATCAT 6060 

GCCCGCTTTA TTAACGCTCT TCTCGCCGTA GATCATGTTG TTATCGACCA GCGATTTTTT 6120 

CAGCTCATCA GAAACACTGC GTGCCGCCAG TATAGGCATC GTTGGCGTAG CAGTTCAGGA 618 0 

ACTCGGTGAA CGTTTTAAAG CCCAGCTCGT CATCCTTGTC GTTTTCATAG CGATATTCAA 624 0 

TTTTATTCCA CAGCCAGACC GACATGTTCT GGTACAGACG TTCCAGATCG ACGCTGCTCA 6300 

GACGCTCACC TTTGCGACCA TTGGTCCGGA AGTAGAGCTC ATGCTGATAC AGACGCTGAA 6360 

^GTTGGTGCC TAAATCCGCA GCCTGCACCA TCGCTTTTGC CGTGTCGGCG TTAAGGCTTA 6420 

GTTGCGTATA CTGTGGAACA TACATGCGAC CAGTAACCGG AACCCCCGTG CCAGGACGAT 64 80 

ATTCCAGACA GTTGACCTCG TAGTGGTAAG TTGGGTCCTT AGACTCCTTT AATCCAGGAA 65 4 0 

ACTTCTCAAA GATTTTTGCC TTCGCAGCCT TCAGAGAATC CTCTGTTTTA TGATCGGCCT 6600 

CATCAATAAA GGCATAACGC GTTTCCTGTT TGCCATCTAC ATCTTCCAGC CAGCTGGCAA 6660 

CTTCCAGCTT CGGTTTGTCA TCAGGTTTGT TTTCTACCTG ATATTTCCAC TTAACTTCCC 67 2 0 

CTGTCTTACT ATCGATGGTG TACGGCAGCG CACCATCTAC GGCAGGATAA CGTTCATAGA 67 8 0 

CCCAAATGCC CGTTGCGCGC TGCTGACGAA CGCGGTTCGG ATACCCTTGC GGATCC 6836 
(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1332 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
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GGAAAAACNC 


GCCGTATATT 


hGCGCGGGCG 


GAAAAAGCCC 


CGTNACGGGC 


AAACGCAGCA 


60 


AGGTTTTATC 


CCAGCGCAGG 


CGCATGGCAG 


GATTTTTGAG 


TAGCCGTTGC 


CCCAGCACCA 


120 


GAAGCCCCAG 


CAATCCCGCC 


AGCCAGTAAA 


CGCCGCTGGT 


CTGTAACGTG 


TCGCTCATGG 


ISO 


CGATGAGCGT 


GCGGGTGGAG 


GCGGGCAGCG 


CGTGTCCGAG 


ATGATCAAAC 


TGTTCGATGA 


240 


TTTTTGGCAC 


CACTGCCGTC 


AGCAAAATAG 


TGACCACGCC 


CGTTGCCACC 


ACCAGCAGTA 


300 


CCAGCGGGTA 


GAGCATGGCC 


TGCAGCAGGC 


GTGAATTTCC 


AGNACCTGCC 


GCTGTTACGG 


360 


TGTAACCCGC 


CAGGCGATTG 


AGCACCACGT 


CGAGATGTCC 


GGATTTTTCT 


CCGGCAGCAA 


420 


CCATCGAACA 


AAACAGGGAA 


TCAAAGACGC 


GGGGATGTTC 


GCGCAGGCTG 


TCCGACAGGK 


480 


TGTAACYTTC 


CTGAATCCGC 


7GCGGAGCGC 


CATTCCGAGG 


CTTTTTACAT 


GCAGTTTTTC 


540 


ACTTTGCTCA 


CTGACCGCCT 


GTAAGCAGGT 


TTCCAGCGGC 


ATTGCTGCCT 


GTACCAGCGT 


600 


TGCCAGTTGG 


CGCGTGAACA 


GCGC-'-AGATC 


TGCCGCCGCC 


ACGCGACGAT 


GTGCGTGCCG 


660 


CCGACGCTGC 


AACATCCCCC 


CTG T CGAAGT 


ATTCATCCGG 


GCTTCAATAT 


GCACGGGGAT 


720 


AAGCTCTTTA 


CCGCGCAACA 


ACTGGCGGGC 


ATGACGCGCG 


GAATCCGCCT 


CAATCATACC 


780 


TTTGGTTTTG 


CGACCATTAC 


GCTCCAGCGC 


CTGATAGTAA 


AACAGTGCCA 


TTACGCCTCC 


840 


ATGGTTACCC 


GCAGAACTTC 


ATCGAGAGAG 


GTTTCTCCGG 


CGAGCACTTT 


CTCAATGCCG 


900 


TTGCTGCGGA 


TACCCGCAGA 


GTGTTGTCGG 


ACATAACGTT 


CCAGCTCCAG 


CTCCGCGGCC 


960 


TGACGGTGGA 


TCAAATCACG 


CAATGTGGCA 


TCCACCACGA 


TCAGCTCATG 


GATGGCAGTC 


1020 


CGTCCGCGAA 


AACCTTTGTG 


ATTACAGGCG 


GGACAGCCCT 


GTGGATGGTA 


CAGAGTGACG 


1080 


GTACGGGCGT 


CGGTAATTCC 


CAGCAGGCGT 


TTTTCTTCGT 


GGGTGGCAGG 


CGCGGCCTGA 


1140 


CGGCAGTCGG 


AGCACAGCGT 


GCGGACCAGT 


CGCTGCGCCA 


TCACGCCCGT 


CAGACTGGAA 


1200 


GAGAGCAGGA 


AAGGCTCCAC 


GCCCATATCG 


TGCAAACGTG 


TGATCGCCCC 


CACCGCTGTG 


1260 


TTGGTATGCA 


GCGTGGAAAG 


TACCAGGTGT 


CCGGTCAGTG 


AAGCCTGAAC 


AGCGATTTCT 


1320 


GCGGTTTCGG 


TA 










1332 


(2) INFORMATION FOR SEQ ID NO: 75: 









(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4407 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
CCCAACGTTT ATCGTATTTC ATTAAAGTCC CTTGCCCGAT GCTATCTCGA GTTACATGAC 
GAAATCGCTG ATTTGGATGT CATGATTGCG GCAATTGTCG ATGARCTGGC GCCTGAACTG 
ATTAAACGTA ATGCTATTGG ATACGAAAGC STTCGCAGTT GCTGATCACG GCAG 3AGACA 
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180 
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ATCCCCAACG ATTAAGATCA GAATCAGGTT TTGCGGCACT GTGTGGTGTC AGCCCTGTTC 24 0 

CCGTATCTTC AGGAAAAACG AATCGTTATC GACTTAACCG GGGTGGAGAT CGTGCTGCAA 300 

ATAGTGCACT TCACATCATT GCCATCGGAC GTTTGCGAAC T GAG GAT AAA ACGAAGGAAT 3 60 

ATGTCGCCAG ACGAGTAGCG GAAGGGCATA CAAAAATGGA AGCAATACGC TGCCTGAAGC 4 20 

GCTATATCTC ACGCGAAGTT TATACATTAC TGCGTAATCA AAACAGGCAG CTCAACAGCA 4 80 

TCCCGATAAC GGCTTGACTC TTAGAAGGGC GTCCAGGGCA GCCACTATAC AAGCAGGCAG 54 0 

TTCCGGCAGT TACTGTGGCG TTACCAGATC AAACAGAGTC TGAGTCGACG AGGAAATTGC 600 

TGGGATAACA GCCCGATGGA GCGCTTCTTC AGGAGTCTGA AAAACGAGTG GATACCGGTG 660 

ACGGGTTACA TGAACTTCAG CGATGCTGCC CATGAAATAA CGGACTATAT CGTTGGGTAT 720 

TACAACGCGC TCAGGCCGCA CGAATATAAC GGTGGGTTGC CACCAAATGA ATCGGAAAAC 780 

CGATACTGGA AAAACTCTAA AGCGGTGGCC AGTTTTTGTT GACCACTACA TTTAGTGCGA 840 

CACGGGAAGC GCGATATGAA CGATACGATA CATCAATGGT TTATTGCGGT GATAACCTGA 900 

AGGGTGAGAT TGAGGCTATT TATAATAGTG TTGAGAGGCG TCAGGTTTAG AGCAGGAATG 960 

CTGAGTAGCC ATCTTATCGA TTGTTTTCGA GCGTAAGATG GCTGAATGGA ATGGCTATTA 1020 

TTGCACAGTC CTTAATTATA ACATTCATAC CGACATGATT ATCTTGTGTC CGGAAGAATC 1080 

AGAGGCTGCG GTTTCAGACT GTCTGCCGGT ACATTCCTCT CTCCGTTAAA AACCATAACG 114 0 

GGTTCATTAT CTTCGTCTGT CAGCAGATTG AATGGCGGTA TATTTTCAGT ACGAATGCCG 1200 

GTCAGCCACT GAAAAATACC TGCGAAATGA CGGGCACTGA TTTTTCTGCT GACGGACTGA 12 60 

TGAGACGTGA TGTCACTGGC GGTAATAATC AGGGGAACGC TGTAGCCTCC CTGCACATGA 1320 

CCATCATGAT GAACAGGATT AGCACTGTCG CTGACCGACA GACCATGGTC AGAAAAGTAA 1380 

AGCATGGCAA AATGACGGGA ATGCCGGCGA AGGATACCAT CAAGCTGCCC GAGAAAGTTA 14 4 0 

TCCCAGTTTA CTGATGCTGG CGAGGTAACA GGCAATTTTT CGGGGATACT GCCCCAGGTA 1500 

ATGATTCGGC CAGGAGTTAA GCCGGTCACA CGGGTTCGGA TGAGACCCCA TCATGTGCAG 1560 

GAATATCACT TCGGAGAGGA TTTATCCGCC AGTGCACGTT CTGTTTCCTG TAACAACAAC 162 0 

ATGTCATCCG TTTTACGGGA AGCAAAGCTG CCTTTCTTGA GGAAAACGGT ATGCTCCGCA 168 0 

TCAGAAGCAA TAACAGAGAT GCGTGTATCA TGCTCCCCCA GCTTTCCCTG ATTGGATATC 17 4 0 

CACCATGTGC TGTATCCTGC TTTTGCTGCC AGCGCCACCA CGTTGTTGCC GGAGTCAGGG 18 00 

TTCTGCTCAT AGTCATAAAT CAGTGTCCGG CTCAGGGAAG GTACGGTACT GGCTGCTGCC 18 60 

GATGTATAGC CGTCAATAAA TAAACCGGGA GCAGTATTCA GCCACGGTGT GGTTGGCACG 192 0 

GGATAGCCAT ATACCGACAT ATAATCCCTG CGCACACTCT CACCAGTGAC GATAACAATC 198 0 

GTGTCATACA ACGGTACACC CGGCAGGATT TTCCAGTTGT CAGCCCCGTG CTGATTCAGT 204 0 

TGTTTATAAC GCTGCATTTC ACGCAATGTG TCAGTTGTCC CCACAACAGT TCCTTTAACC 2100 
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ATCCGCAACG GCCAGCTGTT TACTGAGCAT AATACGAACA GCAGCAGTGC CAGCCAGTTA 2160 

CGGTGACCGC GGTGGTGTGT TCGCCAGAAA ATCACCATGA ATACCAGAAT CGCGGCACTG 2220 

ACCAGAAAAT GATAAACAGG AATCATCCCG GTAAACTCCG CTGCCTCATC AGTTGTGGTC 2280 

TGCAGCAACG GAACAATAAA ACTGTTGTTG ATTTTACCGT ACGTCATACC GGCAGGCGCA 2 34 0 

TACAGTGCAC AACAGAACAG AAATAACAGC GCTGTAATGG ATGTGAGGGT ATTTCTGTGT 2 4 00 

GCAAGAAGCA GAAGAAAGAA CAGCAGCAAC ACATTGCCGG TGGTATTCTT CTCAGTGTAT 2 4 60 

CCGCATGCAA TTGTGGTTAT GACAGAAACA ACAAAAAAGA ATAAAAACAA TATAATCCTG 2 52 0 

AGAGTGTTGG CCGGACAAAA CAGTTTTCTG ATATTCATCG GAGTATATCG ACAACATTAT 2580 

TATGAAGAGA ACAGGATAAT AAAAATCAGA AGTTATCTGT GAAACAGATA ACAGACANCC 264 0 

CTGCAGTATA ATATTACTGC AGGGTGTTCC TTTTTAATTA CAGAAATACG TAATTATCTT 27 00 

AATTGCAGAA ATATGCGCAA TTATCGTTCA GAAGCAGTGT CGTCAGAAGT TATAAGTCAC 27 60 

ACCAAGCAGG ATGTCATGAC TTTTAACATC AACCTCTGAT TTATATTTAT CCCCTTCTGT 2820 

ATCCTTGTAA TACAGGGAGG ATTTACCAGC ATCCAGATAG CGATAGGTGA GGTCAAGAGC 2880 

GATATCCGGG GTTACGTCAT AGCGAACACC GGCCCCAATG CTCCATGCGA AGTTGTCAGC 2 94 0 

AGAGCCTGAG CGTGATATAG AATAACGCAC TCGCTCACCG TAGCCATAAT CCCAACTACC 3000 

GCTACCTGTT GATTCCTGAT GAATTCTGGC GTAACCAATT CCGGCAGACA CCCATGGCGT 3060 

AAATGCACTG TCGTTTCTGA AATCATAGTA CGCATTCAGC ATCAGGCTGT TGACTGACAC 3120 

CTCATTCTTC AGGTCACTAT GTCCCGCGTG GTGCTTATAG AGGTTGTATG TTGTGTCAGC 3180 

TTTTCCACGG GCGTAAAACT CCAGTTCTGT ACGCACAGGA ATACTGAACT GCGGATGCAA 324 0 

GTCATAACCA AACGCTATAC CTCCACTGAA TACCGTGTTA TGGCCATCCC CCCCCTATAC 3300 

TTTGATGTTT CCTCTTTATT TTCGGACAGG AAACTCTGGT CAGAAAGAGA TACTGCTGAA 3360 

GTACCTGGTT TACCGGTCAG ATAAAAACCG CTTTTACCTT CCTCAGCACC CGCATTTGCT 34 20 

GCAANCATAC AGGCAGCGGT AACTGCTGAA AC AG C AAAAA CTTTTTTCAT TTCAATTAAC 34 8 0 

TCCATTATTT CACTATTTTT GTAAATAGCA CTCCTAATAT TTTAAAACCA GTCAAAAGAT 3 54 0 

AGTATCAAGC AAATTATTCA TGTCTAATGA ACAGATAAAA TCGACTATGT GTCGGCAAGA 3 600 
CTCTGCTCCA CCGATATTCC TCTTATTTCC GCCTCGATGA AATACCCCCG TTACCTTATT 3 660 

TGTACCCCTT ATAATGGGAT GTTGGCCAGC CAGACCCGGC ATGATTAGTT CTCCCTGTCG 3720 
ACTATGCTCC GGGAGGGATG TCACCGGGTC TGGTGAGGCG CGGATAACCG CTAATAGGGG 37 8 0 

AAGGTCAGGT ATTTTACACC GGGACCGTCA GGGCAAGATA ACGAAAGCCA GCTCCCCGCA 38 4 0 

TGAACTGACG CCAGATAGTT TCTGTCCATT GCTGCTTTTC TCATCTTACG TCTTAACCCT 3900 
GCCTTGAATA CCTTATCTCT CGTCAAAATA TTAATAGCGA TATGCCGTAT CCCTGAAAAT 3 9 60 

AATCCCGCTG CGTTTCCTCT TCTTACTTGC AGTCGTCTTC ATTCATTACC ACGTCCAGAC 4 020 
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GCCATGCAGC TTATTCTCCA CGTGCCAGTG ATTTCGGATC GCTGTGACGA ACTTCTCTGC 4 080 

GGTTAAATGA GCAGAAGTGA TATAATATCT GACCATTATT TCTGACTCTT GCTTTTGTTC 4 14 0 

TGCTATTATT GAGCGAAAGG AGACTGCCAG GGATATTTTT TCAGCCCTTT GC AT TC AAA C 4 200 

GTGAATTGAA TCAGCTCATC AGGGACNTCG CCAAACCATA TGAAGACGGG ATCCTNCTCT 4 2 60 

GCCGTGACTG TTGTCACTAA TTGCGTAACA GTCATGCTCN GGGATAATTA AATGTTTCAG 4 320 

GGGAAATAAA AAGATTATCA GATATGGGGA TGACACCACA GCAGCGCTGA GGGGAGTATG 4 38 0 

GATAAACGAT GTACCTTATT AACCAAA 4 4 07 

(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 824 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 

TTTTTTGCAA GAGAATTTCC CTGAACCTGA AGCTCATCAT CGCCATCTCC GCCGTTCAGG 60 

TAATTATTAC CTGCTCCCCC AATTAACTTA TCGTTGCCAT CACCGCCATA GAGCTGGTCA 120 

TCTCCGTTTC CACCACTCAG TGTGTCATTA CCTTTATCAC CATATAAGCG GTCATTCCCG 18 0 

TCATTTCCTT CTATATGGTC ATCACCATCC GCGCCATGGA AGATATCAGC AAATTTACTG 24 0 

CCAAAAAACT TGTCGGCACG CGTGGTCCCA ATAAGTTCTT CCACGGAATA TAAGTTATCA 300 

GTCTCTGTTA AATTTTTACC ATTGATATGA GTGAATTCAT AACTCCGATA TTGCGTTTTT 360 

TCAGTTCTTT TTCCAACTGA AACCTCCTGC TCCTTCACAA CTTCCTGTAA AACCTTAACA 420 

TCACCACCAA GTACACGTGT TACCGTGTAA TTACCCGCTT CGGTTGCTTT TGTGCCATCA 4 80 

ATGGTCAGAT AACCGGTGTC TGTTTTATCA TAATAAACAA CATCATGTCC TTTACCTGCG 54 0 

TAGATATTGG CTGAGCCGGC AGATAAAAAG ACCTTATCAT CCCCGTCTCC CAGGTGTGAC 600 

TCAATACGAA TTTCCCGATA CTGGTTATTA CCGACTGATG CATGCTGAAT CAGGTTAGAG 660 

TAATCATATA CAGACCCCTT GTCCTGNAAC CCCCTTCACC GTCCATTTAT CAACACCCTT 720 

GACTAATAAC TCGGTAATAT ATTCATATTT TCCGGACTGC CTCCTTTCAC GAATTTCCTC 78 0 

ACCGGGAGTT TAACAATGGG CGTAACNAAT TTGCAATAAC GTGG 82 4 

(2) INFORMATION FOR SEQ ID NO: 77: 

(x) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 550 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 

GNGGCCGCAG TACTGGATCA TCACCGAAGT TTCGCGCGGA AAAGCGTTAG AGAAAGATCT 60 

AATGCTTCAT GATGGTGATG GACTTTTCCT GATGGTGAAA TCCAGCGGGA AATGCTCTGG 120 

CGTTTCCGTT ATCAACATTC GACAACAAAG CAGCGGACAA TGATGGGACT CGGTGTCTTT 180 

TCCACACTTT CACTTGCTGA TACCCGAGGG CTAAGAGTGG ATTATATTTC CTTATTAGCC 24 0 

AACAGAATCG ACCCGCAAAT TCAAGCTAAA GCCGTAGACG AAGAGCAATA TTTGAAAAGG 30 0 

TGGGCACCTA CGTTACCAAT ACTGGCTTAA TGGCTACATA CGGCGGTCAG GGTCAGTTTA 3 60 

CGCTTACAAA ATATAAAACA ATTTGATACA AAATATTCCT CTTATTCTAA AT AAAAG TAT 4 20 

CTTGAAAACC TTCCAACTGG AAGGTAGATT GAATTTATGC TAAACATAAA GAGGAATTGC 4 80 

TTATGAATTA CGTTATCCGC ACTACCACCG TCGTCTTTAG TCTCATGCTG GGCAGGTTAC 54 0 
GCAACTGCTG 

(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 382 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



550 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 

CACTAAAGGC CCTGGATGTT TTTCGCTCAT TAGTAGACAT CTCGCTGATA ACGGCGCTCT 60 

ACGCGCACTC AC T T AAAAAT TCATCCGCCG CTTCGGTGTC CATGCCACCA AATTCGGCAA 120 

TCACTTCCAG AAGTGCCTGC TCAACGTCTT TCGCCATGCG ATTAGCGTCG CCGCAGACAT 180 

AAATGTGGGC ACCATCATTG ATCCAGCGCC ACAGCTCCGC GCCCTGTTCG CGCAGTTTGT 240 

CTTGTACGTA AACTTTTTCT TTTTGATCGC GCGACCAGGC AAGATCGATA CGTGTCAGCA 300 

CGCCATCTTT GACGTAGCGC TGCCAMTCCA MCTGGTACAG GAAGTCTTCC GTAAAGTGCG 3 60 

GATTACCAAA GAACAGCCAG TT 382 
(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 
TAAATCAGCA GAACTGATAT AATATCTGAC CATTATTTCT GACTCTTGCT TTTGTTCTGC 60 
TATTATTGAC CGAAAGGAGA CTGCCAGGCA TATTTTTTCA GCCCTTTCCA TTCAAACGTG 120 
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AA T T C AA T C A GCTCATCAGG AACATCGCAA ACAATATGAA GACGGATTTC TTCTCTGCCG 13 0 

TGACTGTTGT CACTAATTGC GTAACAGTCA TGCTCTGGAT TATTTAATTC TTTCAGCGAA 24 0 

AATAAAAGAT TATCAGATAT GGGATGACAC ACAGCACCGC TGAGCAAGTA TGTATAACCA 300 

TGTACTTATA ACAAAAGGAG ACGTAAGAAG GGGAACGGGT ATCAGAGGGC CAATCAAAGC 3 60 

AGGTATAATG AACGCCAGTA TAATTGTCCG CAACCCAGAA ATATATTATT GAACTGGTTA 420 

TCTCCTGCGA ATGCATATAC TGCAACGGCC GTTAAAATAG CATTATATCC ATAAAGCCCG 4 80 

GCAGAGATTT TATCAGGAGA AAGCTCAGGA ATAGAGAATG ATACCACCAC ACTCAGAAAC 54 0 

GAAGCGACAA CCGTAATCAT CAGTAGTTTC CGGCTCCCTG CAAGTAGTCC CAGCATAACA 600 

AGAATACCGC CGACAGCATC AGGAAACATA AAAATCTCCA TAAAGCTACC AGACAATGCC 6 60 

ACCGGATAGT TTTTCAGCAA AACAGAACCT GCACTTCGCC CGAAGGTACT GACATATCAT 720 

GAGGCATTAT TCCGGAATGT AATAACCACG TAGCGATAAT AAAGGGGGCG GTCAATACGG 78 0 

GTAACCCTCT GAGCACTGAC GACAACAGGG GAGTAAACAA AACAATACCA AGAGTTCCGA 84 0 

CGATAAGTAC AGCAATTCCG GAGACTGACA CAGGGACAAG CATGCCACAG GCTATGCCAT 900 

AC AGAAC AG C ATTATATCCC CATATACCTT CATTAATCTC CTCATCAGGA TACCGCAAAC 960 

ACCAGGCAAA GAACGGAGAA AGTGCTGCAC TGATGGCTGA GAAATACAGT ATTTCGGGGT 102 0 

GCCCCATATT AAAAGAGGCT ATTCCAGTCG CCAAAAAAAA GAACAAGCCA GAAACAACAT 1080 

TGTTCTGTAA TAATACCTGT GAATACCCCT TACTAAAGGC GGTTATCACC TGTTTTACTC 1140 

TCATGTAAAA TGTCACACAC ACCTCATACA TAAACCATTC TCCGCTTCTG CGGGACAGTA 1200 

CCGCCCCTGA CTCCACCTCA CAGCGGATTG TGTATTTTTA AACAATCACA GTCTTCTCAT 1260 

ATACTTTCCA TTCTGAAGCT TATCTCTTCC TCCGTGATAA GCTTCCGTCG CGGGATGTGT 132 0 

TATACGCCCT GTAAGACAGT TATAAAGGAC ATCAATGCCA TAGTTAATGA YTACCGAATT 138 0 

CCGGTGGATA GTCAGTACTG GTTTGCCACA AAACAGTGCA GTCACACATG ACAGGAGAAG 1440 

ATATGAGCCG GATACCGCTG CTCTGAGACT TAACGCTCAT GTAAACTTTC TGTTACAGAT 1500 

TCTTCCAGGG ACTAAGAAGA TAACTGANTT ACGTTCGCAT TCCAGTSTTT ATTTCTGCAG 1560 

TGACAGCCAT ACCCGAGCTT AATGGAATGT GCTTATTCCC GGTTGACAAA TCATTCTCTT 162 0 

CAACAGAAAC AATGACATTA AAAACGAGTC CCAGTTTCTG GTCTTCTATT GCATCTAAAT 168 0 

TTATATTTTT TACCTTACCC ACCAGATAAC CATATCGGGT GTAAGGAAAA GCCTCCACTT 174 0 

TAATGATGGC ATTCTGCCCG ACGTTAATAA AACCAATATC TTTATTTTGT ACCAGAGCAG 1800 

TAACCTCCAG CGTGTCATCT TCCGGAACGA TGA 2CATCAG TGTTTCCGCT GTTGTAACAA I8 60 

CCCCACCTTC AGTATGAACC TTCAGTTGCT GAACTTTTCC CGAAACAGGG GCCCTGATTA 192 0 

CTGAAGCCTG TTGACGCTCT TCATTTTTCT CTAACTCCAG AGTTAATAAC TCAATGCTGT 198 0 

CTGTTGTTTG TCTTAGCTTG TCTAAAATTT CATTTTTAAA AAGCTGCGTG ACAAGCTGAT 2 04 0 
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ATTCTT^TTT 


TGCAGACAAT 


ATCTCACTCT 


CAATTTGCTC 


CAGTTGCGAT 


TTATAAACCC 


2100 


GTAATTCATT 


TGCTGCCTCA 


ACATATTTAT 


TCTCCTGCTC 


AAGTACAGCA 


TGTTTTGCAA 


2160 


TTGCCTGTTT 


ATGCAACAGG 


CTCCTGAAAT 


CATCCAGACG 


GCTTTTTTCA 


ACCCTCGATA 


2220 


CATTTTCATA 


ACGGTTTATA 


CGGGCAAGTA 


TTGTTAAWCG 


CTCTGCTCTT 


TTCTTATCCA 


2280 


GATTCAGTTC 


TTTTTGATAC 


TTCTGATTTT 


GCCATGTGGA 


AAACTGTTCT 


TTTATCAAAG 


2340 


AAGTTAAACG 


CAGTACTTCC 


TCTTCAGATA 


CATTCTGAAA 


ATAAGGCTCA 


TCAGGAAGTT 


2400 


TCAGTTCAGG 


AAGTTTATTT 


AATTCAATTG 


ACCGGCTCAG 


AATTTGATAC 


CGAATTTGTT 


2460 


CCAGCCTGGC 


CTGTAACAGT 


GATGACTGCG 


TTTTTAACGT 


ATCAGCTTCA 


GCTCCCAGCG 


2520 


CTGTAAGCTT 


TAATAACACA 


TCCCCTTTCC 


GGACTGACTC 


TCCTTCTTTT 


ACGAYAATTT 


2580 


CTTTAACTAT 


CGAGTTTTCA 


ATAGGTTTAA 


TTTCTTTNTA 


CGCCCACTGA 


GTGTTAATTT 


2640 


CCCATTTGCA 


GTGGCAACAA 


TTTCCACCTG 


GCCTAAAACA 


GATAAAATGA 


AAGCAATAAC 


2700 


CAGAAACCCC 


ATAATAAAAT 


AAGCAACCAG 


ACGCGGCCGT 


CTGGATACCG 


GCGTTTCAAT 


2760 


TAATTCCAGA 


TGAGCGGGTA 


AGAATTCATT 


TTCGTCCTTT 


TCACGTACCG 


GAGTATCTAA 


2820 


CTGCTTCCGG 


ATTTTCCATG 


TTTCACTCCA 


GACAAGTTTA 


TAGCGCAACA 


GGAACTCGCT 


2880 


GAACCCCATT 


AACCATGTTT 


TCATATTCTT 


CTGTTCTTTC 


TGTTAGTCTG 


ACTGTAACTG 


2940 


ATATAAGTAA 


CTGTATAAAC 


TTTCCGGTTC 


AGAAAGCAGC 


TCCTTATGTT 


TACCCTGTTC 


3000 


AACAATTTTC 


CCTTTTTCCA 


TGACAATAAT 


GCGGTCTGCA 


TTTTTTACTG 


TAG AC AG AC G 


3060 


ATGAGCAATG 


ATTATAACCG 


TTCTGCCCTT 


ACATATTTTG 


TGCATATTGC 


G CAT GAT G AC 


3120 


ATGCTCCGAC 


TCATAATCCA 


GAGCACTGGT 


TGCTTCATCA 


AAGATGAGTA 


TTTTAGGGTT 


3180 


GTTCACCAGC 


GCCCTTGCAA 


TTGCGATGCG 


TTGACGTTGA 


CCTCCGGATA 


ATCCTGCCCC 


3240 


CTGTTCCCCG 


ACAATGGTGT 


TATACCCCTC 


ACGCAATTCA 


GAAATAAAAT 


CATGAGCACC 


3300 


TGSTAATTTC 


GCTGCATAAA 


TAACTTTTTC 


GACGGACATG 


CCAGGATTAG 


CCAGTGAAAT 


3360 


ATTATCAATA 


ATACTGCGAT 


TAAGCAGCAC 


ATTGTCCTGC 


AACACAACCC 


CCACCTGACG 


3420 


ACGTAACCAG 


TTAGGATCGG 


CCAACGCAAG 


ATCATGTCCA 


TCAATTAAGA 


CCTGGCCATT 


3480 


TTCAGGAATA 


TAAAAACGTT 


GAATTAATTT 


AGTTAATGTG 


CTTTTTCCTG 


AACCAGAACG 


3540 


TCCGACAATA 


CCAATAACCT 


CCCCCTGCTT 


AATACT 






3576 


(2) INFORMATION FOR SEQ ID NO: 80: 









(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3541 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 
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TCAGCCCGGT 


GAGCGGGTTT 


GACAATTCCG 


CACTCACCAT 


TGGGCTAAGG 


GTTATCAGGT 


60 


GGGGTTAAGG 


AAATGGCAAA 


ACCTACCCCC 


GTCCAAACTC 


CAGTCGCTGC 


ACATTCACCA 


120 


TCCCTGGGTT 


CTCACCTGCG 


CTGACATCAA 


TTTGTGTCAC 


CCGCAGCGCA 


TATTTTTGAT 


180 


CCAGTGCTTT 


TAACCAGTTC 


AGCAGGTCAT 


T AAA C A C C A C 


AGGTTCTATC 


CAGACCTGGA 


240 


TATTCTCCCC 


GCGCTCGGCA 


ATCCGTTTGA 


TGACCACCGA 


GTGGGCGGAA 


GCTGTCACTG 


300 


ATGACCCGCG 


ATACCTGTGC 


TGGCGTTGTC 


GTGCCGGATT 


T-TCGCGCCGC 


AATAATATCC 


360 


GGCGCGGCGC 


TCTTCAGTCG 


CGCGTTCATC 


GCCACCAGCT 


GCTGCAACAT 


CGTCTCCTGT 


420 


TGCTCAATCC 


GTTCGCTCAA 


CGGCTGCCAG 


ATGAGAACGT 


AATATCCGGC 


GCTAAACAGG 


480 


AACACTACCG 


CTGCCAGTAA 


CATGCCTTTT 


TCACGCGGCG 


AACGCCCCGC 


CAGGTGTTGT 


540 


GTCAGCCA3T 


GTTCGCCACG 


GCTTAACTGG 


CGTTCACGCC 


ATTGCTGAAA 


ATAGTGAATA 


600 


AATTTATCGC 


GTAACATGTT 


ATTTCGTCCG 


CAACGTTACG 


CCGCCGGAAA 


CCGCATCACC 


660 


CTCTTTCTGT 


AACGCGTCCT 


GTTGCACAAC 


ATAATCTGCC 


GCGAGTGCGC 


TACGAGTTTA 


720 


TCGAAGCTGG 


CAAAGTTCGC 


AGCCCGTAGC 


TGGAGGTGAA 


GCGTCTGGCG 


TTTTTGATCA 


780 


AAGGTGAAAC 


ACGCATTTCG 


ATGTCGGTAA 


GTGACGCTGA 


TTTCAGGGTA 


CTGGCGATCG 


840 


CTGAGAATTC 


TGCGAGCAGC 


CGGGTATCGT 


CGGTCTGTGG 


GCGATATTTT 


TTCAGCGCCA 


900 


TCGTCACCTG 


AGAGCGTAAA 


TTCACAATCC 


GCTTCTGCTC 


CGGGAATAGC 


GTTAAGAACT 


960 


GTTTCTCCGC 


CTGGGTGCGG 


CTTTGCGCCA 


CCTGTTCGCT 


GACGCTCCAT 


AACGTCACGC 


1020 


CCCGTTCCAC 


TACCAGCGCA 


ACCAGAATCA 


ACAATATCGG 


CAGAATCATC 


ACCCGCCAGC 


1080 


GCGCCCACTG 


TTTTCGGTAG 


CTGACACGAG 


GCTGCCACGG 


CCCTGTTAGC 


AGGTTCCCTT 


1140 


CCGGTTCGCC 


ATAAGTGGTA 


ATGGCGGGCA 


GAGCGTAACG 


GTCAGCGTTC 


GGCGTCTGCA 


1200 


CCAGCCCATG 


CAGACAGTTC 


TTCCGGTGCA 


ATGCCGACCA 


CGGTTAGTGA 


AAGCGGTAAA 


1260 


TCCTGCTCAT 


TGAGCTGTGC 


TCGGAACATG 


ACCGGAGCCA 


GCGCCCGCCC 


GGCGGTCCAT 


1320 


CCCCGGCATT 


CATCGATGCG 


GMAGATAACC 


CGTTGCGCAT 


CGCCAGCCAT 


AAAC C C AC AA 


1380 


GGAATGGACA 


TCCAGTCCGG 


CGCGACGATA 


GCGCGGGTGA 


TGCCGTTTGC 


CTGCAACCAC 


1440 


TGCGCAATGT 


TGCGCATATG 


CTGCTGGTGA 


ATCACAGCTA 


CGGTTGCCAG 


TTGCTGGTCG 


1500 


ATTTTCAACG 


GGGCGAAATG 


CAGTTCATCG 


ATATCCTGGT 


TCAGCTCTTC 


TTCCAGCAAG 


1560 


GCGGGCAGAA 


TCGTCGGTAT 


CTGCTTGCGG 


GGCACATCAG 


GCAGTTCAAC 


CTGCCAGACG 


1620 


CTGATCCATT 


CGCCGGGAAT 


GTAGAGTCGA 


ATCGCATCAG 


TTTGCAGCCA 


TTGCTGGAGA 


1680 


CATTCATCAG 


CAACGTCAGG 


CCAGATGCCG 


CACTCCACGT 


CGGCGGTACG 


ACGCTGCCAA 


1740 


CGGATGGGAG 


CGGAAMGNCA 


AAGCGGGAAA 


AAAATCTCAA 


GGATGGAACT 


CACTCACTTT 


1800 


CTCCTGTCTG 


ATGCCAGAGA 


ACAGAAAAGT 


GTTGTGGGCC 


CATGCGGACA 


ATTAACGAAT 


1860 - 


TCATCGTCAG 


TTCAATCTCA 


TTCACGGTGA 


TATCTGAACG 


CAGGCAGAAG 


TAATTGCTGT 


1920 
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CCACGCTCAG GACGGTTTTT AGCTGTTTTT TAGTACGCTC ATCGACGTCA GCAAGTAACG 1980 

GCTGTGCAAG AAACTGATCG ACATCTTCCC AGCCCTTCGC ATGACGTTGT TGTAATAACG 204 0 

CTCGCGCCTG AACAGGGCTT AACCACGGGT CAAACAGCGC CTCAAGAATC ACACTTTGCG 2100 

TGACGTCTAA GGTATTGATG TTGATTTGCT GGCGGGTCAT CGGCAGCGCA CAGACCAGCG 2160 

GTTTCAGTTT TTGATAAAGG CCGGCGTCCA TTCCGTGCAC CACGCGCATC TCGCTGATAT 2220 

CAGCCAGCGG TTGATTAGCG GCGTAAAACG GCACCGAACG GGCGAGATAC TCGCTGTCTT 2280 

CACGGCCCAG ACGCGTCTGC ACGCTGCGGT CTTCGTCAAT AAACTCCCAC AGGCTTTCGG 2340 

CTATCAGTTC GGCGCGATAA GCAGGCACAT CCAGGCGCGT GATCAGGGCA ATCAGTTGTT 24 00 

GTACCGCGAG CGGACGCGAC GCCGTCGTCG GCTGAGCGAG GGCATTCAGG TTAAAGCAAG 2 4 60 

CCTGTGCGTC ACGCAGAGTG ACGGCGATTT GCCCTGCGGC AGTGGGAAAA AACGCGGGCC 2520 

GGAAGCCCNA CGTGCGCCAG ATGCACGCGC TTTTCATTTT TCAGGCTCAG ACTGAGTGCG 258 0 

CTCAACGCCA GGCTTTCCGC ACTGGCGCTG TACCACAGCG CCTGCTGGTA CTCCTGCTGG 2 64 0 

TGCGCGTTCG CCCAAGTTGT TTCTGCATCC GCCCGGAAAG CGTGATGGTC ACCAGCATCA 27 00 

TAACCGCCAG CAATACCAGC ACCACGACCA GTGCCATTCC GCGTTTTGGT GGTGAGGTGA 27 60 

TCATGATAAT TGCGGCCCGC GTAACAACCA GATGCGTTCA ATTTCGCCCC ATTGTGGCGA 2820 

ATGCAGGGTT ATGCGTACTG CCACGGGGAT CGCCTGCACT GATGACCAGC TCTCCTGCCA 288 0 

GCGCGTGCCG TCGTAGAACT GCAAACGGAG CGAATCCGCC GGGATTAATT TTTGCGTTGT 2 94 0 

TGGCTTCACG CTGCCTGCCG CATCGGTCAG TGGCCAGGGT AACCGTTCGA GATAACCACC 3000 

ATGAATGCGG TAACCGACGG TGAGCAGATT ACTGCGCGGC AGACGCATCA ACGGATTAAC 3060 

CACGCCGCCA CGTACAAAAC GCATCCCTTC ACTCTCAGAC GCCAGCACGC CAGCGCCCGC 312 0 

CAGTAACGCT P.GTTCACGCT GGCCCTGATC GCCTCTTACC GGACGCGGCA TCATTTGTGT 318 0 

CAGATCGTGG GTCAGAAAAC TCATCGTTTG CTGCATGAGG TTTAGTTTTT GATCGTGTCC 324 0 

GGCGACGGCG CTATTCACGC GTGTAACCCG TTTGTCACCT GCTGCGCCAT CATTGCCAGT 3 300 

GAGGCAAAAA TGGCTATTGC CACCAGCATT TCCAGTAACG TGAAACCAGC GCGAGTCCTT 3360 

CTCACTGTTG GTGTCCCACG GCGCTAAACC ANGCGCGTCG TGACTGAATC ACTGACGAAA 34 20 

AGTCNTCATG AAGACTGACT TCAATATCCA CNGCATGGAG CAGCGCATTA NCGGTATTCA 34 80 

GTGGTGTTGG TTCGCCAGAA CCAAGCGGCT TTCCTGCCAT AATCGCTCTC GGCCCTGGGT 354 0 
G 

(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1224 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



3541 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 

GTACTGGACA TCTTTGATGA ACAAGCTCCT CAGTGTAAAT TGTACGTCTC TGATCGTAAT 60 

CTTCCTGAGG GCGTTGAACA TCTATCCGCT GAATTTATAC CCTATACTCC TGAGTCGGCA 120 

GATTTTCTGA TTCAACGTTT TTTC7CTGAA ACTATCCATA TTGAAAGTGC AATTGTTGTT 180 

ACAGCACTTA AAATTGCCAA TCAGATTGCT CTATCTCAAA ATGAGACCAA GAATGTGTAT 24 0 

CTGCTTGGAT TTGATTTTAC GATAAAGGGG GGGTTCACTA GCAAGATCCC CTGCGCAGCC 300 

TTGCATGCCG AACCAGAATA TCAAGAGCGA ATTATCAGTA GTCAAGAACA GCTATTGCAG 3 60 

ATGCTCCTTG CAGAAAAAAC ACGCCTGAAT ATCAATATCA ATCATGTTGG TAATAAGCCT 4 20 

TACAGCGTAT ATTCTGTTGA TGCATTTAAT CAAGTGTTCG CTGCCCGCCA TCGTGGAGTC 4 80 

GTGCTGCCCA CACATGCCCA GATTTCCACT ACATCATCAC AAAATGGGGT GAAGGTGATC 54 0 

GCAGAGATTA CTACTAATCA CTTTGGTGAT ATGGACCGAT TGAAGTCAAT GATTGTAGCG 600 

GCCAAGCAGG CAGGGGCTGA CTATATCAAA CTGCAGAAGC GTGATGTTGA AAGTTTCTAT 660 

AGCAGGGAGA AGCTGGAGTC ACCGTACAAC TCTCCTTTTG GCACCACCTT TAGGGACTAT 720 

CGGCATGGCA TTGAACTCAA TGAAGAGCAA TTTTCCTTTG TCGACTCTTT CTGTAAAGAG 7 80 

ATTGGTATCG GCTGGTTTGC TTCTATTTTA GATATGCCCT CGTATGAGTT CATTCGGCAA 84 0 

TTTGAACCAG ATATGATCAA GCTACCATCA ACTATATCTG AACATAAAGA TTATTTGGCT 900 

GCTGTTGCTT CTGATTTTAC TAAAGATGTA GTAATTTCAA CTGGTTATAC TGATGAGGCC 960 

TATGAGCGTT TTAYCCTKGA TAACTTTACC AAGGTTAGAA ATATTTATCT GCTGCAATGC 102 0 

ACCTCGGCTT ATCCCACACC GAATGAAGAT ACCCAGCTAG GTGTGATAAG ACATTATTAT 108 0 

AATTTGGCGA AAAAGGATCC ACGTATTATT CCTGGTTTTT CCAGCCATGA TATTGGTAGC 114 0 

CTTTGTTCCA TGATGNTGTC GCAGCCGGTG CAAAAATGAT TGAAAAGCAT GTTAAATTTG 12 00 

GCAATGTGGC TTGGTCTCAC TTTGATGAAG TTGC 12 3 4 



(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6313 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 

ATGGGACCTT TCTTCAATGA TGTTGCCGAG TGGTTAGAGT CATTAGGTCG TAACGCTGTG 60 

AATGTTGTAT TCAATGGAGG AGATCGTTTT TACTGCCGTC ATCGACACTA TCTGGCTTAT 120 * 

TACCAAACGC CGAAAGAATT TCCTGGTTGG TTACGAGATA TCCACCGGCA ATTTGACTTT 180 



5NSDOCID <WO _9822575A3 IA> 



WO 98/22575 PCT/US97/21347 

-197- 

GATACCATTC TCTGTTTTGG TGACTGCCGT CCATTGCACA AAGAAGCAAA ACGTTGGGCG 24 0 

AAGTCTAAAG GGATCCGCTT TCTGGCATTT GAAGAAGGAT ATTTACGTCC GCAATTTATT 300 

ACTGTTGAAG AGGACGGTGT AAACGCGTAT TCATCGCTGC CGCGCGATCC TGACTTTTAT 360 

CGTAAATTAC CAGATATGCC TGCACCACAT GTTGAGAACT TAAAACCCTC GACGATGAAA 4 20 

CGTATTGGTC ATGCAATGTG GTATTACCTG ATGGGATGGC ATTACCGACA TGAATTCACT 4 80 

CGCTACCGTC ATCACAAATC ATTTTCTCCT TGGTATGAGG CTCGTTGCTG GGGGCGTGCG 54 0 

TACTGGCGTA ACTATTTTAC AAAATAATGC AACGTAATGT ATTGGCTCGG TTAGTGAATG 600 

ATCTGGACCA ACGTTACTAT CTTGTTATTT TACAAGTTTA TAATGATAGC CAAATTCGTA 660 

ATCACAGTAA TTATAATGAT GTGCGTGATT ATATTAACGA AGTTGTATAT TCATTTTCGC 720 

ATAAGGCACC GAAAGAGAGT TATTTGGTGA TCAAACACCA TCCGATGGAT CGCGGTCACA 780 

GACTCTATCG ACCATTAATT AAGCGGTTGA GTAAGGAATA TGGCTTAGGC GAGCGAGTCA 84 0 

TATACGTACA CGATCTCCCA ATGCCGGAAT TATTACGCCA TGCAAAAGCG GTTGTGACAA 900 

TTAACAGTAC AGTGGGGATC TCTGCACTGA TTCATAACAA ACCACTCAAA GTGATGGGTA 9 60 

ATGCTCTGTA CGACATCAAG GGGTTGACGT ATCAAGGGCA TTTGCACCAA TTCTGGCAGG 1020 

CCGATTTTAA ACCAGATATG AAACTGTTTA AGAAGTTTCG TGAATATTTA TTGATGAAGA 108 0 

CGCAAATTAA TGCTGTTTAT TATGGTGTAA AATCAAAAAG CAATAGAAGG TCCGCATTCC 114 0 

TAAACGGTAG CAGATGATGG TTTTCATGGG CGTTTCAGGT TACTCAATCA GCCAACAACC 12 00 

GCAGCGAAAA CCCTGCTTTC TCGACCAGTT CAGGCCGGTT TTACCTCCAA TGCTTTCCGT 12 60 

CAGAACTGAG ATTTCAGCCA GTTGCCGGAT AAGTGTGTCG ATTTGCAGCA GTATACTTTT 132 0 

TCGTACAGCC AGAATGTGGC AGACTGAGGT GGAATAGATA ACGTCCGTAT GCCCGCTCAC 1380 

CACCTCCGGG CGGGAGTGTG TGGTATCTGA CATCATCATT TTTCCTTTCT GTTTATAAAT 144 0 

GAAAACGCCA GCCGTGTTCA GGCTGACGTC AGGGAAGTGA AATCGGGTGA GTGATCTTCA 1500 

CTGGTTCTGG TGCAAAAGTT ACTGTTGGCG CAGGGTACGG ATACCCTCCC TGGCCTGTTC 1560 

GATACAGGGC AACAGTGCTG CCGAATCTGT TTTATCCTCA TCGTTGTCGA AGATAATTCC 162 0 

CGATTCGCAG TCGATATTGT CCTGCAGCCA CGTAATCAGA ATATCCAGCG CTGTTTCCGT 168 0 

GGTTAATGAT TTCATGTTGT GAATTTCCGG ATTACCAGTC GAAAGTGGGT AAACCTGGCA 17 4 0 

GACATCTGGC ACTGGCATCC AGATGAATGA GACTGACACC ATAACGCCGG ATGAGTGTGA 1800 

CGACCAGACG ACGGAACGTA ACAGATAACC GGTACCGGTA AAATGAATCC ATTCTGATTC 18 60 

ACCAAAGTCA CTGGTCTGGT GTAACAGCGA GTACAGCCAG GCGTTGTCCT TTTCCGTGAT 1920 

ATGTGCGGTA CTGCAGCGTA TGCCGGAAAG AGTCGTAAAC GGTTGTGGAG TGCAGGTTGA 198 0 

CTGTTGGTCA GATTCATCCA CCACGCGGAG TGAATAACCG TTTTCAGCGA CCTTGTTAAT 2040 

CAGTTCAGCG AGATTAATAC CATCGACGTC AACGACAATG CGCCCCATAT TCAGTGCCTG 2100 
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-198- 



TACGTTAAGG 




CCGGCGTCAG 


G G AAAG T TTC 


ATTGTTTCAC 


CTCCGGGTGC 


2160 


TTACCCAGGA 


TAATATTATT 


TACGGCTCTG 


TAATTGTCGC 


GGGTCATCAG 


GCCGGTCGCC 


2220 


CTGCGAGCCC 


GGAGGATATC 


GATGCTGTTT 


ATTAACTGAG 


AGCGGGTACA 


GGCGGTGAAT 


2280 


CCCGGCTG3T 


CGGTACGCAC 


CAGCGCGTAT 


TTTTCCACGA 


GAAAGTTCAC 


CGCATCACAC 


2340 


AGTGAAATGC 


CTGCCTCAAT 


ATGCTGCTCG 


ATCACACGTT 


CATCGGCAAA 


CGGTGTGTCA 


2400 


TTCAGTGTGA 


GGCCGTAGTG 


CTGGTCCAGC 


AGTCGGGACA 


GAAGTATCTG 


CCAGATTTCA 


2460 


ACAGGAGACG 


GGCGAGAACT 


GGCCGCCTGC 


CCGGGTAATA 


CAGGTAATGT 


TTTCATACTG 


2520 


AAGATTTTCC 


TGATATGCAG 


ATATAAAAAT 


GGGAAAGTGG 


CGTGGTGAAA 


ACACCAGGCC 


2580 


GTAGCAGAAG 


GCTATTCTGG 


AGAGTTAATT 


TTTCATTTCG 


GGCGTCGGAT 


AAACAGCCAG 


2640 


ATAAACGTAA 


CCACAACTGC 


TGAGGGTATC 


GGCTTTGCAG 


GTCAGCCCTT 


TTGCATACAG 


2700 


CGTGACGGTA 


TGCTGATGGC 


GGGGATTCAG 


TTCACCGCTG 


GTGAGCATGA 


GTTCCAGTTG 


2760 


TTTCATCAGC 


AGCGGAAAGG 


CCTGGTCCAG 


GTGGTACGCA 


TCTGCATTGC 


TGTATAGGCC 


2820 


TCTGATACCG 


GCGCGGTCGG 


CAAGGTAATG 


CAACCGGTTA 


CCCTCCTGCA 


CCAGACGTGC 


2880 


CCCGAAACAG 


GGCGTCACGG 


TGCAGGGCAG 


CCCCCACCAG 


GGGCGGTCGT 


GATTGTCGTC 


2940 


GGGAAGTGTT 


GTCCCGGGGA 


GTGTGTCTGA 


CACGATAAAA 


TCCCTACAGA 


AAATCGGCTA 


3000 


AGAATGCTCC 


GGTATTGGCG 


ATAATTCTGC 


TCATCAGAAT 


TCCCACTCAG 


TTCAGGGTGA 


3060 


CGCTCATCAG 


CCGGACATAC 


GGGCCAAAAC 


TGTCCTTACG 


GCGTTCAGCA 


AACACGGCCA 


3120 


GCACACCGGG 


AATATCCTGT 


ACTTCACGAC 


CGGTATACGC 


CTCAGCACTG 


CCGTGCCAGC 


3180 


GGTACTTACC 


GGTGCAGAAC 


GGAAATAGAC 


GGGATGCAGG 


ATGCTGTTGG 


TGAATACGCA 


3240 


TGGCTTCACC 


ACGGGTGATG 


ATTTTCATAA 


TGGGATACCT 


CTGAAGACAG 


AAGATAAAAG 


3300 


TGAAAACAGG 


TGTGATGTGG 


TTGTGACGGT 


GACGGGTTAA 


AGCAGACCGT 


GTTCCGCAAA 


3360 


GGAGAAAACC 


TGACTGCCAC 


CAACTATCAG 


ATGGTCCGGT 


ACCCGGATAT 


CCACCAGGGC 


3420 


CAGTGCCTGT 


ACCAGACGTT 


CCGTGATAAG 


GCGGTCTGCC 


TTACTGGGGG 


TGACTTCACC 


3480 


GGACGGGTGA 


TTGTGTGCCA 


GTACCACGGC 


GGCGGCATTG 


TGGTACAGGG 


CGCGTTTAAT 


3540 


CACTTCCCGG 


GGATGGACTT 


CCGTGCGGTT 


GATGGTGCCG 


GTGAAGAGGG 


TTTCACCGGC 


3600 


AATCAGCTGA 


TTCTGGTTGT 


TC AG AT AC AG 


TACCCGGAAC 


TCTTCACGCT 


CCAGTCCCGC 


3660 


CATCTTCAGA 


ATCAGCCATT 


CCCGTGCCGC 


ACGGGTGGAG 


GTGAAGGCCA 


CGCCGGGTTC 


3720 


ATGAAGATGG 


CGGTCCAGGG 


TTTTCAGGGC 


CCGCAGAATG 


AGACTGCGCT 


CGCCGGGCGT 


3780 


CATCTCTCCG 


GGCAGAAAGG 


AAAGTTGTTG 


CATTGTGCTT 


CTCTCCATTC 


AGTCGATGAT 


3840 


GCGCATAATG 


GCGCTGCATT 


CCGGATGCTG 


CAGGGCGTAA 


TCCCGCAACC 


GGTAATAATG 


3900 


GATCGTCATG 


GCATAACACT 


CCGTACGACA 


GGCATGATGA 


CTGTACGTCA 


TCAGACAGGC 


3960 


GGCAATGCCG 


GCGGCTTCCG 


GGCTCATTTC 


AGCGCGGTTA 


CCGTTCATGG 


CATTGAACAG 


4020 
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-199- 

TACCCAGTTT TCGTCATCAT CGTCATCCGG TTCGGGTGCC ATAAATGCCC CGCCGTTGTT 4 080 

CAGGGTGTAC AGATTCCAGA TACCACCGCA GTAGTCTTCG CACAGACGGT CCATCCAGCC 4 140 

GAAGACACGG GGCTCCAGGG TCACCCACTG TGGAATGAGG CCAAAGTGCT GCGGCCAGAA 4200 

GCTGATGCGC TGTTCATCAG GGACTATGGT GGCAACCAGC TGAGGCTGGT CATTCCCTGA 4 2 60 

TGCAGCGGTT ACGGAAACAG AAG 3AGTGGT GGAATTATGC AAGACGGTTG TCATGAGATT 4 320 

ATTCCTTATA AAAAGTAAAT GAATGGAAGA AACCCCGGGG GAAGGGACAG ACGTGAGTCA 4 380 

GAACTGCGCT TTCAGGGAAA CGGCATCAGC GCATACTCTC CAGCAGCGTT TCAGCCATCA 4440 

CCCACAATGC GCGGTTGAGC TTAATGTCGG TGTCGATGCT GTGAATGGCA CGGGTATGGA 4 500 

TACGTTTTCC TCTGGCACTG C G AC C G G AAA TTCCGCCTTT CAGCATATTC TCCTGAATGG 4 560 

TCTGATAAGC ACTCCACAGG TCCTTACCGT AATCCTCCCG GCGTCGTGGT GTCAGAATGT 4 620 

CGGCGGTGGT GACGGGCTGA TGTTCGTCAC CATAACGGTA AGTCAGTGCC GCCTGTGCCA 4 680 

GCGCCTGGCG TGCCGGTGGC GGCAGAATCA GCGACTGCAT GGCATCACGC TTTTCCTCAA 4740 

TCCGGTCAAA AACCCCCACC ACCTCGTAAG CCCCTTCAAT AACTTTCTCC ACCACATTTC 4 8 00 

CCCGGTGCGG AACACGCACT TCCCCCAGAG ACTGACCACA GACGCATCCG TTCTGGCAGA 4 8 60 

CGAACCTGAA GTAACCCGGC AGCATCTGGT AGCTGGAGGT ACCGTCATGA GAGTTGAGCA 4 920 

GAATAATTTC AGGGACATGT TCTCCGTTTA TCTCTCCGGC CCGCCGCAGA CGCAGCATGT 4 980 

GTTTGGTGTA TTCCCGGCGG TCCGGGTCAC GTACGCGGGT CTGGCAGGCG AAGAATGGCT 5040 

GAAAGCCTTC CCGCTGCAGG CTTTCCAGTA CGGTGATGGT GGGGATGTAC GTATAGCGTT 5100 

CACTGCGGGA GGTATGCCGG TCTTCACCGA AAATACCCGG TACATGGTGC ATCAGTTCTT 5160 

CGTGTGTCAG CGGACGGTCA CGGCGTATCT GGTTCGCATA ACCAAAACGA CTGGCTAGTC 5220 

GCATAATTTG CTCCTTATCG GTGGTTAAGA TTTACTGGTG TAATAAATGA AAAAGCCACG 5280 

TCTCCCGGAG AAGACGCGGC CTGACAGATG AAATGAATGA CGTTTATTGT CTGAGAAGCC 5 34 0 

CTTAACTGGC GAGCTGAGTA TTAAGCTGTG TTCCGGCATC ACCAGCGCAA CTGACCTTCA 54 00 

GCATTACGGA TAACCAGCCG GGAATATGTT CCCTGGTCAT CTTCAGTAAA CACATTGCGG 54 60 

TAAGCTGTTA TGACAGCAAC CGCCTGCCCG TATGAGAAAG ATCCTTCAGC CAGGACATAC 552 0 

TCTGTGTGTA ACCCGGCATA TCTGGTTTCT CCTGATAAAT AGCCTCTGCC ATACGTTGTG 5580 

GCAGAGGCTG AAGCATGAAA CTGACTTCAG GGATCAGTTA ACATTTTTTC CGGAAACGGT 5 64 0 

AATCAGCAGT GGATGGTAGT CCTGGGGATC GAAAACCGAT AACGGCAGAC TGACACGATG 5700 

GCCGTTACTT TCTTCAGTTG CTTTAATGAT TTCGGTTGTG GCGACATTTT CCACGCACTC 57 60 

CGTTTCCAGA AATGCGTCTG TGGTTCGCGT GGCATTACTG TCACCAAAGG CTTCCGTTTC 5820 

CATTTTTCTG GTCACCAGCG TCTGACCATA TTTGTCTTTG AGTTGCAGAG TGATGGTGAG 5880- 

GGGGCCAAAT CCTTCATCGT TTCCGCCATT ATCCAGCCGG AACTGGTAAG CACAAATATT 594 0 
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-200- 

TCCCGGGAGC CATATCGTAT CTGTATTGCG TATACTGATG TAACGTTGAT CCTGTGCCCG 6000 

GAGTGGGGCA GACCACGTTA ACCCCAGAAT GAAGGCGGTA ATCATGCAGG TTTTGAACAG 6060 

GTGAATCATG GTATTTACCT CTCTGAGTCA TGACGATTAC ACTGACAAAT CAGGTGATAA 6120 

AACGTAAAAG GGGCAGAATA GCCGTTATGC CGGTAACTCC GGGGGTAATG TTTCTTCCAG 61S0 

TCGGTTAACC ATATTGCCGA GATGGGATGC ATCATATTCC ATGACGGGGC GTTGCCTGAT 624 0 

GATACTGACC ACCAGTGGTT TGATTAACAT GTTGGTCGCG GCCCGTTGTT GTATACCGGC 6300 

GGCGAAAATG ATC 6313 
(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 432 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 

CGTTGGCCGC TTGCGCAGAT AAAAGCGCGG ATATTCAGAC GCCAGCACCG GCTGCAAATA 60 

CGTCTATTTC AGCAACACAA CAACCAGCTA TCCAGCAACC GAATGTCTCC GGTACCGTCT 120 

GGATCCGTCA GAAAGTCGCA CTGCCGCCTG ATGCTGTGCT GACCGTGACA CTTTCTGACG 180 

CGTCGTTAGC CGATGCACCG TCAAAAGTGT GGCGCAGAAA GCGGTGCGTA CTGAAGGTAA 240 

ACAGTCACCA TTCAGCTTTG TTCTGTCATT TAACCCGGCA GATGTTCAGC CGAACGCGCG 300 

TATTCTGTTG AGTGCGGCGA TTACCGTGAA TGACAAACTG GTATTTATCA CCGATACCGT 3 60 

TCAGCCGGTG ATCAACCAGG GCGGAACTAA AGCCGACCTG ACATTGGTGC CGGTACAGCA 4 20 

AACCGCCGTG CC 4 32 

(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3494 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 

GGGCTGATTA CGATTTTATC AATCTGTCTA TAGAACATGA ACTGAATGAA GGAATAGCTG 60 

GCAGAGAGAG GTTATGCCGG ACTGGCGGAT AACCGGAACC GGTTGGCAGA GGTGGTTACC 120 

CGTAAATTGC AGGACAGCTT TTATATGAAC TTTCCTGGGA TGCGCTGAAC ACGGCATACA 180 

GTGAACACCC AGAGTGGTTT TCCGGGCTTG TCTCCGGGGA TGAGAATTAA AAAGTGGATT 24 0 

ATGCTGCTAT AGCGCGGCGT GATTTCCTGC AGGGATTTCC ATTTATAAGA ATACGCCGCT 300 
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-201- 

TCGGGGAATC TCCGGTTCTC CTGAGAGTTA CGATTGTTTT TTTACTCAAA TCCACAACAC 3 60 

CTGAACTGGA ACTTGTGTTG CATCCCTGAT TGTTACTCTG CAGGAAACAT CTTTTTTACC 4 20 

ATCAAAGGAT GACTGTTTTC CTTTCTCCCC TCCGTAAAAC ACAACTTCGA TCACATTTCT 4 80 

GACATTTTTT CCAGATTTTA CATAACAGGA TTGTTTCTGT ATGTTTTTTA TCTGGTGTAA 54 0 

ATTTCAGCAC TGACATTCCG CTTACGTTAA TTTACACTGA ATACCCCACG AGGAGAATAT 600 

GCAGCACCGG CAGGATAACT TACTGGCGAG CAGAACGTCG TTGCCTGGTA TGGTTTCCGG 6 60 

TCAGTGCGCA TTTAAGCTCC GCACTTTCTC TCCGGTGGCA CGCTATTTTT CCCTCCTCCC 7 20 

CTGCCTTTGT ATTCTTTCGT TTTCGTCTCC GGCAGCCATG CTGTCTCCGG GTGACCGCAG 780 

TGCAATTCAG CAGCAACAGC AACAGTTGCT GGATGAAAAC CAGCGCCAGC GTGATGCGCT 3 40 
GAAGCGCAGT GCGCCGCTGA CTGTCATACC GTCTCCGGAA ATGTCTGCCG GTACTGAAGG 900 
TCCCTGCTTT ACGGTGTCAC GCATTGTTGT CCGTGGGGCC ACCCGACTGA CGTCTGCAGA 9 60 

AACCGACAGA CTGGTGGCAC CGTGGGTGAA TCAGTGTCTG AATATCACGG GGCTGACCGC 102 0 

GGTCACGGAT GCCGTGACGG ACAGCTATAT ACGCCGGGGA TATATCACCA GCCGGGCCTT 108 0 

TCTGACAGAG CAGGACCTTT CAGGGGGCGT ACTGCACATA ACGGTCATGG AAGGCAGGCT 1140 

GCAGCAAATC CGGGCGGAAG GCGCTGACCT TCCTGCCCGC ACCCTGAAGA TGGTTTTCCC 1200 

GGGAATGGAG GGGAAGGTTC TGAACCTGCG GGATATTGAG CAGGGGATGG AGCAGATTAA 12 60 

TCGTCTGCGT ACGGAGCCGG TACAGATTGA AATATCGCCC GGTGACCGTG AGGGATGGTC 132 0 

GGTGGTGACA CTGACGGCAT TGCCGGAATG GCCTGTCACA GGGAGTGTGG GCATCGACAA 138 0 

CAGCGGGCAG AAGAATACCG GTACGGGGCA GTTAAATGGT GTCCTTTCCT TTAATAATCC 14 4 0 

TCTGGGGCTG GCTGACAACT GGTTTGTCAG CGGGGGAGGG AGCAGTGACT TTTCGGTGTC 1500 

ACATGATGCG AGGAATTTTG CCGCCGGTGT CAGTCTGCCG TATGGCTATA CCCTGGTGGA 15 60 

TTACACGTAT TCATGGAGTG ACTATCTCAG CACCATTGAT AACCGGGGCT GGCGGTGGCG 162 0 

TTCCACGGGA GACCTGCAGA CTCACCGGCT GGGACTGTCG CATGTCCTGT TCCGTAACGG 168 0 

GGACATGAAG ACAGCACTGA CCGGAGCTGC AGCACCGCAT TATTCACAAT TATCTGGATG 174 0 

ATGTTCTGCT TCAGGGCAGC AGCCGTAAAC TCACTTCATT TTCTGTCGGG CTGAATCACA 1800 

CACACAAGTT TCTGGGGGGT GTCGGAACAC TGAATCCGGT ATTCACACGG GGGATGCCCT 18 60 

GGTTCGGCGC AGAAAGCGAC CACGGGAAAA GGGGAGACCT GCCCGTAAAT CAGTTCCGGA 192 0 

AATGGTCGGT GAGTGCCAGT TTTCAGCGCC CCGTCACGGA CAGGGTGTGG TGGCTGACCA 1980 

GCGCTTATGC CCAGTGGTCA CCGGACCGTC TTCATGGTGT GGAACAACTG AGCCTCGGGG 204 0 

GCGAGAGTTC AGTGCGTGGC TTTAAGGAGC AGTATATCTC CGGTAATAAC GGTGGTTATC 2100 
TGCGAAATGA GCTGTCCTGG TCTCTGTTCT CCCTGCCATA TGTGGGAACT GTCCGTGCAG 2160 
TGACTGCACT GGACGGTGGC TGGCTGCACT CTGACAGAGA TGACCCGTAC TCGTCCGGCA 2220 
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-202- 

CGCTGTGGGG TGCTGCTGCC GGG 2TCAGCA CCACCAGTGG CCATGTTTCC GGTTCGTTCA 2 28 0 

CTGCCGGACT GCCTGTTGTT TACCCGGACT GGCTTGCCCC TGACCATCTC ACGGTTTAGT J 34 0 

GGGGCGTTGC CGTCGCGTTT TAAGGGATTA TTACCATGCA TCAGCCTCCC GTTCGCTTCA 2 4 00 

CTTACCGCCT GCTGAGTTAC CTTATCAGTA CGATTATCGG CGGGCAGCCG TTGTTAGCGG 2 4 60 

CTGTGGGGGC CGTCATGAGC CGAGAAAACG GGGCCGGAAT GGATAAAGCG GCAAATGGTG 2 52 0 

TGGCGGTCGT GAAGATTGCC ACGCCGAACG GGGGCGGGAT TTCGCATAAC CGGTTTACGG 2580 

ATTACAACGT CGGGAAGGAA GGGGTGATTC TCAATAATGC CACCGGTAAG CTTAATCCGA 2 64 0 

CGCAGCTTGG TGGACTGATA CAGAATAAGC CGAACCTGAA AGGGGGCGGG GAAGGGAAGG 2700 

GTATCATCAA CGAAGTGACC GGCGGTAACC GTTCACTGGT GCAGGGCTAT ACGGAAGTGG 2760 

CCGGCAAAGC GGCGAATGTG ATGGTTGCCA ACCGGTATGG TATCACCTGT GACGGCTGTG J 820 

GTTTTATCAA CACGCCGCAC GCGACGCTCA CCACAGGCAG ACCTGTGATG AATGCCGACG 2 88 0 

GCAGCCTGCA GGCGCTGGAG GTGACTGAAG GCAGTATCAC CATCAATGGC GCGGGCCTGG 2 94 0 

ACGGCACCCG GAGCGATGCC GTATCCATTA TTGCCCGTGC AACGGAAGTG AATGCCGCGC 3000 

TTGATGCGAA GGATTTAACT GTCAGTGCAG GCGCTAACCG GATAACTGCA GATGGTCGCG 3060 

TCAGTGGCCT GAAGGGCGAA GGTGATGTGC CGAAAGTTGC CGTTGATACC GGCGCGCTCG 3120 

GTGGAATGTA CGCCAGGCGT ATTCATCTGA CCTCCACTGA AAGTGGTGTC GGGGTTAATC 318 0 

TTGGTAACCT TTATGCCGGC GATGGCGATA TCACCCTGGA TGCCAGCGGC AGACTGACTG 324 0 

TCAACAACAG TCTCGCCACG GGGGCCGTCA CTGCAAAAGG TCAGGGCGTC AGCTTAACCG 3300 

GCGACCATAA AGCGGGAGGT AACCTGAGCG TCACAGCCGG AGCGATATCG TTCTGAGCAA 3360 

TGGAACGCTT AACAGCGACA AGGACCTCAG CCTNGACCGC CGGCGGCAGA AATTCACTCA 34 2 0 

ACAGAATGAA AAACTGACTG CCGGCCGGGA TGTAACGCTT GCCGCGAAAA AACATCACAC 34 80 

AGGGTTACCG GCCA 34 94 



(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9319 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 

GNCCCAAGCT TAGGTTCGCG GCCGCAGTAC TGGATCTATT GCCAGCTTCA CCGCCAGACT 60 

GTCAGTCAGT ACATCACCGT ATTTCTGCTG GCAGGTTGCC GGGCGGCTGC ACAGTCACTG 120 

ATCAGTTGCT TCTGCTGTGC CGTACTCAAC TCTTCGTACT TTTTGATAAT ACCGCCGCAG 180" 

TCACCGCCTT TCGCCTGACA GGACTTCATT TCAGCAGAGC AGGCATCTAT CTGCTTATTG 24 0 
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-203- 

CTCAGGTAGT TATTCTCAAC AACAACCACA GGGGATTAGA AGCCTTTTAG CCTGAAATAT 300 

TTTGCGAGAG CACATGCAAT ACCAATAAAT GAGCCAATCA CACATCCGAT AAACAAAACA 3 60 

TGCCGAATCT CTTTCAAACT AATATTTAAA TTACCTGTTA TCAACCACTC CACCAAAGAA 4 20 

AAAAACACAT CAATACATAG GAATGACACC ACTATAGAAA GAAATGCGAT TATAAAAATA 4 80 

ATAAACAATT CTGATAAGTG CTGAGAATTG CCGCTCATTT TTTCACCTCC GGAATGTAAG 54 0 

ACTCAATCTT TTTACCTTCA TACTCAGAAG CAAAAGAAGC CGACACATCC CCAGCTATAC 600 

CAGGAATCCT ACTGGGTGTC ATTTCTTTTG ATAGCCCCAA TTCTCCTTTA ATATCGGTAT 660 

ATTTTTGAAG TGTTGGATTA AATTTCGGGT CCCAGCCGTC TTTTAACCAG TTAGCACCAC 720 

TATTAATGCC CCATGAAAGG CCTTTACCAA TGCCATATCC AATAGCAGAA CCAGCACCAT 780 

TGATCAACGC ACCAGATGTT GGGGCTTTTC CTTCGAGCCA GTTTCCTAAT GCTCCTCCAG 84 0 

TTGCATTCCA GCCAACTGTG CCTACAACTC CATTCCCTGC ACTAATCACA TTAAGCCAAC 900 

CACCGATAAT CGCTGTTGTA GGATCTATAG TTCCATCCGT CAGATAGCTA ACACCTGCAT 9 60 

TAGCTCCTGC CCCTAATCCC CACATGGCCT GAGCACCGCC AGTAAGAGAG CTACACTACC 102 0 

AGTGGCCAAC GCTCCGGCAT ACGCTTTATT GACTGCTTCT CCTCGCTTAC AGGCTTCACC 1080 

GCCTGGGGCA TCGTTACAGG AAAGTACATC TGCGCCATGC GTCTGAGCAG CTTTGCTCTG 1140 

CTCGGACTCT GTGCCACCAA CCAGGTTATT CTCAGCAATG TTCTTCCCGA CACCAGCCCC 12 00 

AGCAGCCGCG CCAGCCACAT CGCCACTGGC AATGCCGCCA GCCATACCCG CTGACAGCGT 12 60 

TGCCAGCGTG CTTACGGTTT GCTTCTGATC TTCTGTCAGT TTCGACGGAT CTACGTCCGG 1320 

ATAGAGGCTT TTCGCAATGG CTGACGAGAT CACTTCACCA GTACCCGCAC CAATTGCGCC 138 0 

TGCTGCCGCA CTGTTGCCCT GAAGGGCTGC TGTCACACCA CCGAGAATGG CATGGGCAAT 14 4 0 

GGCTTTTGCC GCTGTATTGT CATCAATACC CGCGTGATGA CCGATGATGT TCGCCAGCTC 1500 

CGGCGCCGAA GCTCCGGCCA GAGCACCTGC TAAATTACCC CCCGCCAGCC CCTGAAGTGC 15 60 

AGCCGTTGCA GCCTGGATAC CGCGCTGCAT ATCGCTGCCG GTACCATACT TTTCCTGTTC 162 0 

CTTTTTGTAT TCCGGCGTAT CACGCAGTTT TGCCAGATAT GCCTGCCGCT GTTCTTCCGT 168 0 

CGCATCCGCC GGAACAGGCC CATATTTATC CTGCGCAGCT TCAACGCATT CAGTTCCCCC 17 4 0 

TGCGTCCGCG CAATATCCGC CACCTGACTG CCTATGTCAC TGATAAGCCC CACTGTCTGC 18 00 

AGACGCCTCT GCTCCTTCTC CTTGTCAAAT ATCGGGCTGA TACTGTCATT AGCGTGCGCA 18 60 

GGGTCACGGC TCAGGTTCGC CAGATTCTCC TTCTGATTGC CCCTGTCCCG GATGGTGATA 192 0 

GTGCCTTCTG CCACTGCGGC CTGAGTCGTT CCTTCCGCAT GTCCGCTGTG ACCTCCGGCG 1980 

GATATCATGC CACCCGGCAT GTTACCCTGA AATTTATCCC CGAAGCTGCC ACCACCGCTC 204 0 

AGACTGATTC CACTGTGACT GACTTTATAA TCCGCTTCGT TGTGAAGGTC ACTGAACCCC 2100 

AGCGTTCCGG TATCCAGGTG GTTTTTATCC GGTGTGGCAG TGGAGGCAAT CACCGCACCA 2160 
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TCCAGTTGGG TATGTTTACC CACTGTGATG TCGAAGCCGC 
GTTTGTTCAG CAACGGAGTC AAAGCGGCTC TTCATCTTAT 
CCTGAGCGGG TCATGGAGCC AAA 3GTAAAA CTGCCGCCGG 
CTGTCGTACT TACTGGTGTG CTGGTGGCTG CTTATCAGCA 
ATAATCCTGT TGCCGTTGAC CTGAGCACCG TTCAGTACCG 
GTGACGGTTT TACCGCTGTC TGTTGTGGTT TCAGTCCACT 
CTGCCTTTTG CCGCATTAAC GCTGGCAAAG ACACTGATAC 
ATACTGACAC CCACGCCACC GCCACTGCTG CTGTTCCTGC 
GCCGCGCCAC TCAACAGAAC ATCATTCGCA GCATCCAGGT 
AGCTGGCTTC CGGCAATCAC AATATCTCCG CGGTTATCGC 
ACAACAGACA GATTATTCCC GGCATTCAGC GTACTGCCGG 
TGTTGTTGTG ATTTCGATTT CTGGGTGGTG AGGGACAGGC 
TCACCGGTTG CGGAGGCCAT TGCCGCAGGC TGTCCGGCCT 
TTTGTAGCCT GCAGGGTTTT CAGACGGCTG TCACTGCTCT 
GTGACCGCAT TATTGATGGC ACTGCCCACT GTGCCGGAAA 
TTCTTCTGCT CAAATTTTTC GTCCACAGTA CGACGGTCAT 
CTGTCACCGG TAATGCTGAT ATCCCGGTTC GCAATCACAT 
TGTTTGCCCG CGGTAATACT GACATTACCG GCAGTGGAGC 
CTCTGCGTTG TCCCGGCCTC GCGGCGGTCG TGCGTTGTCT 
CCAATACCGC CGGTACCCAT CAGACCGGAT TTCTTCGTTT 
TCTGTACTGG TGGCAGCAAG AACATCAACA TGGTTACCCG 
TCAGCCACCA CATCCGAACC CTCTACCGTC AGGTTATCAC 
TTCCCCGACA GCAGGGAACC TGYTTCACGG GAGGCACTGT 
GTTTTCTTAC TGAGAAAACC TCCGCTTTTT TTCTTCGTTT 
TCTGTCGCCG TGGTCAGGGC AACATCACGA CCGGCATTCA 
GTAACGGATG ACGCAACAGC GGTGATATCC CGTCCTGCGG 
CTGGCGATTT CCGTTCCCTG CTGACGGACT GTCTCGTTAA 
GTATAGCTGT CGCCTGCGCC GGCAGACTCT GCCACCAGGT 
ATGACCACGT TATTTTCCGC AGCCATACCG GCAGCCTGAC 
ACAAGGAGGA GGTTATCGCC CGCCGTCACC GTGGACACAG 
TCTGACCTGC CGTTGCGACT GTTTTTGCTT TCCCTGACTG 
CCTGCAGAAA GCAGGGCGCT GTGCCCGGCA GAAACAGAGG 
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CGTCACCGGC AAACATTCCG 2220 

CCCGGGAGGC AGCGATGTAA 228 0 

CASCCACGCT GGTCTGTTTA 234 0 

GGTCGTGGCC CACATCGGCG 2400 

TATCCCGACC ACTGTTGATG 2 4 60 

CAGTACCGTT ACCTTTCTCG 2 52 0 

CGGCACCTTT ACCTGCACCG 2580 

CCGTTGTTTT TTGTGTGTTT 2640 

TTGTGTTACC ACCGGCCTTA 2700 

CCCTGTTTTT ACCGGTTGCG 2760 

ATACTGTGTC ACTTTCAGAA 2820 

TGACTCCCGT CGCATTCGGG 2880 

GCACACCAGA CAGCGCTGTC 2 94 0 

CCTTCGTCTC CTGTGCACTG 3000 

GGGCAACCGT CAGCCCGCTT 3060 

GCCCCGGGTC AACCACCACA 3120 

CCGAACCGCT GATATGAGCC 3180 

CGATGGTACT GGCACTCTGA 324 0 

TACTGCTGCC AATGGTGAAG 3300 

CCTTAAAGCG CCAGGACGTA 3360 

CCGCCAGTGA CACATCCCGG 34 20 

CGGCGTTAAC GGTCACGCGG 34 80 

CCTCACTGAT GGTGTGGGTG 3540 

CCAGATAGTG ATAGTCACTT 3600 

CGCTGATATT GCCGGTTGCG 3660 

TGACGGTGGT GTCACCACCK 3720 

TCTCTTTCTT TTTCTTCGAC 3780 

TCACATCACG TCCGCCCCGG 3840 

TGGCAATATC ACGACCGGCA 3 900 

CTGCGTGGCT TTCATGACTT 3960 

CATTCAGACT CAGGTCGTTA 4 02 0 

ATGCTGTGAC ATCCAGATTA 4 08 0 
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TGGCCTGCAG CCATCGCCAG GTTACCGCCG GCGCTGATGC TGCTGCCCTG TGAGGTGGTG 4 14 0 

GATGATGAAC TGTTGTCATC AGTGTGCCAG AAACCGGACT GACTTTTGCT CCCGCTTATC 4 200 

AGGTTTACGG CAATGTTGAT GTCATTACCC GCAGACATTC CAAGGTCTCC ACCGGACGAG 4 2 60 

ACCGTTGCCC CGGTAATATC AATGTTTTTC CCTGCATCCA GTGAAAGTGA ATCAGTGCCT 4 320 

TTAATGGTCG CAACCGGACC GGTGTCCGTA CGGCTGAGAT GCACACCACC ATATCGGCTG 4 380 

TCACTGCCCG CATTCCATTG CTGACGCCGG GTGATATTGC TGATGTTGCC ACTCACGCTT 4 4 40 

TCCAGTTGTA CGGTTTTACC GCTGATGACT GAGCTGATAT TGCTGATATC CCCGATGGCG 4 500 

CTCAGGTCCA GGCTACCGCC CGCGCTTATC AGCCCTGCAT TCAGGTTGTC GATATAGCCG 4 5 60 

GTACTGTCGA GCGAAAGGTC GTTCTGTGCG TTGATGCTGC CGCCGCTGTT GGTGATATTG 4 620 

CCGTCCGCAA GCTGCACGTT GTTCCCGCTG ATAACGCTGC CGTTATGCAG GGTGATATCT 4 680 

TCCGGCGACA GATACAGTTT CGGGACCATG ACTGTCTGTC CGTTGATGGT GACTGACTCC 47 40 

CACCACAGCA TGCTGCCGTC AAGCTGAGCA ATCTGTTCAG CTGTCAGCGC CACACCAAAC 4800 

TCTAATCCCA GTCCTTTCTG TTGTCTGGCC GCGTTATCCA TCAGATACCG CATCTGTTCC 4 8 60 

GTGTGTGAAC CCAGTCCGTT G AG AT AACG T GAACCCGTCC GGCTCAGCAC CGCGTTACTG 4 920 

ACATACCGGG TATCAAAGAC CGCATCCCCC AGGAAACGAT AATCTTTTTC CGGTTTCAGC 4 98 0 

CCGAGGCGGT CAAGAAAATA CGATGAGCCC AGAAACTGTT TTTCATCGGT ATACGACGGA 504 0 

GCCGTTTCAC GTGGCGCCTG ACCCGGTTTC GCTCCAAGAA GCTCATACAG TCCGGCAAAC 5100 

AAATGGCTGT CCACCTGTCC GAGACCATCC AGTTTCGGGT TCACCGTAAT CAGATACGGA 5160 

CTGTCCGGGT CCGTGGACGG AACCAGGTAT CCATTGTTGC GGGAAGGCAG TGGCCAGTCA 5220 

TCACTGATAC CGGTCTGACC GGTCAGTGGC GAACCTCCGG CAATATTTTT CAGGGCACCT 5280 

GCCAGTTCAT CGTGCCATTG CGGAGAGCCA ACCACCACCG GCTCATACTG CTGCAGCGCT 534 0 

GTCTGTGTCA GACTGTCTCC GCCGGTCTGC TGACTTAACG TATTCAGTAC AGGTGCAGAG 54 00 

ACCACCGGAC TGACACTACC TGCATGTGCA GTGGTTGTTC CGTTATTGAT ACTGCTGGTA 54 60 

AAACGGGTCT TAACATCCCC GCCCGCCTGA ATAACGGAAT AATACGTCTT ACCGGGCGTG 5520 

TAATCTTTTT CCCGGCCATC CAGTGAAAAT CTGATGGTAT TGTTTTCAAA TTCCGGTGAC 5580 

AGCAGGGGCA GTTTATCCAG AGAGCCTGTT GCATAGCTAC CGTAAAACGT TTTCGGGTCG 5 64 0 

TAGCGGTATA CCAGATATTC ATTCTCTGTC CCCGTCTGCC AGCTCTGATT GCTTAACTCT 57 00 

CTGCCCGAGA GTGCGATATC CCCATTCGCC AGGATAAATG ACGCCCGGTT TTCCAGTCGT 5 7 60 

TCAGCCTCAG CAGAAAGATT ACGCCCTGAC GCAATGCGGC CTGCCGGATT ATCAGCACCG 5820 

GTTACTGTTG TGATGTTCTG GCTGCTGAGA AAGCGCTGTG TGGCACTGTC AGCAAACGGA 5880 

GCGTAATAAT AAAGCGTATC CATTGTGATA TTGCATGCCC CGTGCCCGTT GCAGGGCGTA 5 94 0 

CCGTGCTGAT TTTCAACTTC ACGGGTGAAA TAGCCATAGC TGCCGTCAGG AAGAAGGGAA 6000 
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AGGGGAATAT CAACCAGAGC ATTTCC CATT CCCTGAATGG ATGAGGGGTT AGTCCGGGTT 6060 

GTTGTTGTGG CAGAAAATCC CTCCCGCTGG TTCAGAAGAT GCCCGGTTCT TACAACAATA 6120 

TCGCCCTGAT GCGTCTCAAT ATTCCCGGAA GTATTGATAA TCTCTGTGTT TGCACCGCCG 6180 

GAAGCATCCT TCTGTACCCA CAGACTGTTG CCGGCCAGGA TATCACCATG CTGGTTATGC 624 0 

AGACGGTCTG TAAACAGCTT CAGGTTATTC CCCGCATAAA TCAGCGCACT GTTCAGCAGG 630 0 

GTACCGGCCA CATTCATTGT CAGACTGCCT GCCGTGCCGG TAAAACCACT GATGGTGATA 6360 

TCACTCCGGC TGTTCAGACT CACATCGCCA CCGGCCTGAA GTGAACCCGG TGCGTTAAGG 6420 

AAAAGACGCT GTGCGCTGAA AACACTGTTG CCTTTACCGG CAGTCAGCGT TCCATTGTTG 64 BO 

GTGAATGCCT CTCCGGCACC GAGCACCATG GCATCACCCT GCATGACACC GCCGTTGGTG 654 0 

ATGGCATTTT GCGACGTGAC GGAAAGGGTT TTCCCTGCGG CCAGGGTACC GTAATTCGTG 6600 

AGGGCAGCAA TCAGTTTCAG TGTGACATCA CCGGTGGCCA CCACCTGCCC CTGACCACTG 6660 

AAGTCCTGAG CGTCAAGCAG CAGGTTGCCT GCACTGTACA GCCGCCCTGT ACCATTTTGC 672 0 

AGCAGTGAAC TGCCCTTGAC GCCAAGCCCG GAGGTTCCCA GCAGGGTACC GCTGTTGCTG 67 8 0 

AATGTGTGGT AATTCACCAG CAGGTCCGCA CCCTGAAGCG TACCGGTATT ATTCAGCGTG 684 0 

GTTCCTTTAA. CGTCGGCACT GCCGGTGGCA AGTACGCGTC CGCCGTTGAC AGTATTCACC 6900 

ACATCCAGCA GCAGGGTGGC AGCCTGTACC AGTCCGCTGC CGGTGTTCGC CAGCACCTGC 6960 

GCCGTCAGCG TGAGGTTACT GCCGGAGAGG ATTTTGCCGT CGTTCTGCAG ACGGTCAGTG 7 02 0 

GCGTTCAGGG AAACCCCGCC ACGACCCTGT ATCGTGCCCT GGTTACTCAG GGTCGCAGTA 7 08 0 

CTGACATTCA GTGCATTCCG GCTCATCAGA ACACCACCGG AACGGTTGTT CACGCCACCG 714 0 

GAGGCGGCCA GCGTCAGCGT TTCGCCCTGC AGATGCCCGC CGTTTGTGAG TTGTCCTGCC 7200 

GTGATGGTGG TGGCATTTCC CTGTAATTGC CCGTCGTTTG TGACACTGTC TGCCTTCAGC 7260 

GTCAGCACAC CTGCACTGAG CAGTTTTCCG ' CTCGCGTGAT TGTGCAGCGT CTGATTCACC 7320 

GTGAGCGTGA GAGCATCCAC ACCGGTGATG TCACCCGCAC TGGTCAGTGA GTTCGCCTTC 7 38 0 

AGGGTCAGAT TTTTTGCAAT CCATTGTCCG CTGTTGCTTA AATTCAGTGC ACTGAGCGCC 74 4 0 

ATTTCACCGT TCGAGGTGAC TTTGCTGCCT GCTGTGCTGA CGAGCTCACC CGTCAGACGT 7500 

GCAGTCAGGC TGTCAGCCGC CTGGATCGCC CCGCTGTTTG CCAGACTGTC TGCGGTGATC 7 5 60 

AGCACCCGTT TGCCCTGCCA GTGTCCGGAA CTGGTAATAC TGCCTGCGGT GATTGTCAGA 7620 

TCGCCGCTGG TCAGCAATGA ACCTCCGTTA TTCATCAGCG CAGGTTGAGG GGATGCCATA 7 68 0 

CGGGCGGCAA GCGTCAGCGC GGCTATCCCG GTGAGCGTGC CACTGTTGGT GACACTGTTC 774 0 

TGGCGAATCG TGACATGGTT ACCCTGGACA GTGCCGCTGT TATCCAGTGA GTTTCCATCA 7 800 

AGGGAGAGCG TGCCGGCCGA AAGCAGACTG CCCCGGTTGT CCATGGTGGC TGCTTTCAGC 7860" 

GTGGTGTCAC CCTGGCTCAT GATATCGCCG GTACTGGTCA ACTGACCGGT TGCCGAAGCA 7 92 0 
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GTAAGGTTAC CGGTTGCCAG CACGGAACCA CTGTTCGCCC AGTTGTCCCG CYTGGACGGT 7 980 

GAGATTCTGT CCGTGCGTGG TCCTGGGGTA TGCAGTGTTT TACCCCGGAG GGTGAGGTCG 8040 

CCCGCGGTCA GGCAGCGCCC GTTACTACCC TGTGAGAGGG TGTCGCCAGC AAGGGCCAGT 8100 

GCACCGGCGC GCTGCAACAG GCCGTCACCA TCCAGCGTGG TCGCCCTGAG GCTCAGCGTG 8160 

TCAGCGATGA TTTTTCCGGG ATTGCTGAGG GAGACAGCAT TTAACATTAA ACCATTATCA 8220 

CCGGTGATAA GCCCGCTGTT GCGGATGTCC GGTATATCCA GCGTCAGGTC TGCAGCACTG 8280 

TACAGCGTGC CGTTCTGCTG ATTATCAAGC CTCTGTGTGT TAACGGTAAG TGAGGCCTCC 8 34 0 

CCCTGCAACA GACCGCTGTT GGTCAGGGTC TGTGACTGTG TATTCAGGGC GGAACCAACA 8400 

AGTACGCCGC TGCTGGTCAG TTCCGGCGCA CTGAGGCTGA GCGACGGGGC ACTGCTTTTC 8 4 60 

CCGCTGTGGG TGAGCTTTTC ACTGGCGTTC ACCACCATGG TCTGTTGTGC TGCCTGCGTA 8 520 

CCTGCAAGAC GTGCATCTCT GGCGTTGATG CTGAGATTTT TACCGCTCTG AAGCTGTGCG 8 58 0 

CCCGCTGCGG TACTCAGTTT GTCTGCGTGA ACCCGGAGGG TGTCACCGGC ACTGTTTTCC 8 64 0 

CCGTCCAGCG CCACTGTTGT CACATTCAGC GTCATCGCAG CATCGCTGTG GGTGACCGAT 8 7 00 

TTTTTACCGG AGGTGAGCGC CTGCGCACTG ACCGTCAGCC CTTTGCCGCC GGACAGCACA 8 7 60 

CCGTTCTGTG TCACATCCTG CGCCTTCAGC ACCAGTACAT CATCGCTCAC CAGCGAACCT 8820 

GTACTGGTCA GTTTCCCACT GGCCGTGATA TCCACTTTGC CCTTCGCGCC AGTGCGGCCG 8880 

CTCTGGGTAA AGTCGCGGGT ATTCACGGTC AGGGGACCGC CACTGAGCAG GGAGCCACTG 8 94 0 

TTGCTGAGCG TTGTACTGCC GAGCGTCAGG GAAGCCCCCT GAACAGCACC ACTGTTATTC 9000 

AGCGTGCCGG CATCGAGTCC CGCATGACCT TTCGCCAGCA ATATTCCGTC CTGTGTCAGC 9060 

GTGGTGGCGC TGGCCGTGAG ATTCTGCCCG GCGGTTATCT GTCCCTGTGT TGTCAGCGTG 9120 

TCACTGGCGA CAGTCACGAT ATCGCGGGCC GCGTTAATCT GGCTGGCGGT ATCCTGTGTG 9180 

ATGTTTTTCG CGGCAAGCGT TACATCCCGG CCGGCAGTCA GTTTTTCATT CTGTTGAGTG 924 0 

ATTCTGCCGC CGGCGGTCAG GCTGAGGTCC TTGTCGCTGT TAAGCGTTCC ATTGCTGAGA 9300 

ACGATAATCG CTCCGGGCT 9319 
(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 551 Dase pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
(CO TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
ATGAGGCGAT TAAAGCAACA TTGGGCAGTG ATAATGCCCC CACCCAGCCA CCTAACGCAG 60' 
CGAAGAGTAA TACATCGCCC ATGCCTAATG CTTCTTTACG CAGAACTATT CCGGCTATCC 120 
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AGCG3AGGGA GTAAAAAGTG ATAAATCCCA CCAGTACGCC GGTAACTGCG TCTTGTAGCG 180 

TTAACGGACT CTGTTGCGCC CATGCTGCAA TCAGCCCGGT CCACAATACG CCCTGAGTAA 24 0 

AAACATCGGG CAGCCATTGG TTGTCGAGGT CAATGACGCT CGCGGCAATC AGCCAGGCGG 300 

ATAATATCAT CACCGCCAGC CCCCATCCAC TTTCTGGCCA CACCAGACTC GCCAGCAAAA 360 

AAGTGAGTGC TGTCAATAAC TCAACCAGCG GATAACGTTG CTGATTTTCG CCTGACAGTC 4 20 

GCGGCAGCCC TTTGAGCATC AACCATGAGA GCAGCGGAAT ATTGTCACGA ACGCGGATGG 4 80 

TCTGGTGGCA ATGCGGGACA GTTGCGAACC GGGTTAGCCA AGGGCTTTAT TTTTTGGACT 54 0 

GCGGCACTCG G 551 



(2) INFORMATION FOR SEQ ID NO: 87: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 595 base pairs 

( B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 

CATTTACCAA ACCCCGTTCG AATATCTTAT CTATTGCCCA TCTCATATTA AATATAACCG 60 

ATAATTTGGT GGATACTAAT AGTAATTACC TTGTTATTGA AAATATAATT ATTGTTATTT 120 

TTAGCCTCAT TAATTAAATT GAAAAATCCT CTCTAATTTT TGTCAGATTA GGGCTGTAGA 180 

AAGGATCGAG TTCAAGATGT TTACCCCATT TGCTTTTCAT AAAGTCCACT TCCCTGGCAA 240 

ATCTGGCTAG TTTCTCCGGT GAATCTTCGG CTCCTCGACT AATCGATTCA TAGTGGTAAA 300 

GCTCGGCATA AGGTGTCCAG AGATTACGAT ACCCCGCTTC GNGTACTTTC AGACAGAAGT 3 60 

CCACATCATT AAAAGCAACA TGCAGATTCT CTTCATCCAA CCCGGCAACT TCCTCATAAA 4 20 

TATCTTTGCG AATAAGCAGG CAAGCCGCCG TGACGGCCGA GAGAGTTTGT GTCAACAACA 4 80 

AACGGCTGAA ATAGCCCGGA TGGTGGCGAG GATAATGTTT ATGGGAGTGT CCAGCTACAC 54 0 

CACCAATACC GAGAATCACT CCGCCATGTT GTAAAAGTAT CATTACTGTN ATAGG 5 95 
(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 399 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: lxnear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 
TGGCAGTTGA ACAGATTTTC ACATCAGCAA CAGATTAGCG AACGGGACTT GGCATTAGCC 60 
GAGCGTTTTA GTGAANGTTT AGCTCTAACA CGTCTATTAG AAGAGCGCAC GCAGNATTAT 120 
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CACTGAACTA GAGATTGAAA AACAATTGCT TACCACCAAG TTGTCTGGCG TAGAGCAGCA 180 

GTTAAGGGCT GAGCAAGAGT CGCTTCAGCA GGCCCAGTCT GCATTGCTCT CAGCAGCAAA 24 0 

AGAAAAGCAA CATCAACTTG ATGAGTTGGA ATCGGTGCTC AATGAGCGGT ACAGTGAGAT 300 

TGCAACCTTA ACCCGTTGGC TGGAAGAACG TGATCAGGCA CTCCTTAGTG CAGCAAGTGA 360 

ACAACAACAG ACCAATGANA CCATATAGAG CTCAGCCAG 399 



(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1013 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 



ATACTCTGCT 


TGTTGAGCAG 


CCATTACGTC 


GCTTTGTGAC 


GCAATATTAG 


ACTCGTGCAC 


60 


TGCTATTAGT 


TGAGTCAGTT 


CATCACATTG 


TTTAGAAGCC 


GCAGCCAAAG 


CAAGAGTTTG 


120 


CTCATCTATG 


CTTTGCTGCA 


ATGTTTGTTG 


CACAAGTTGC 


CCTTCTTCCA 


GCTGTTGCTG 


180 


TAGATTTGCA 


CTTACCTTTT 


TCAGTGCATC 


ATATTCCAAG 


CCTAACGTAT 


CGTGCTGTGC 


240 


TTCCAGTAAT 


CC AT AAGCAT 


GCTGCAACTG 


GTTTTTAGTT 


TGCTGCTCAC 


CGTCAAGCTG 


300 


TTGCTGCAAT 


GCATTAGCCT 


GCTGTTGCAA 


CAAGTTCACC 


ATATTGTCTC 


GCTCGGCCAG 


360 


TGTACGAACC 


TGTGTATCCT 


GGATATGTAG 


CGCTTGTTCC 


AACTGAAGCT 


GTAATTCGGT 


420 


AATTTGCCGC 


GAATGTTCGC 


TCAATGCTCT 


GTTGCTCTTG 


CTGAGCGCGA 


GAGTAAGGTG 


480 


AGATGCACGC 


TGTGTTTCTT 


CACTCAATTG 


TAACGTCAGG 


GTATTGACCT 


GTTGCTCCAG 


540 


TTGATGGCGA 


GCTTGCTCCT 


GGCTCGTGAT 


GCGACTCTGT 


TGCTGCTCTA 


GTTGATGCAG 


600 


AGCTGTATGC 


AACTCATCGT 


TGGCTTGTAT 


TCGCTCCTGC 


GACCATACAC 


TCAAGTTTGT 


660 


TTGGGCCTCA 


TTGAGCTGTT 


CTTGCAATAA 


TGCCACCTCA 


GATGTCAGCG 


AATTGATATG 


720 


TTGCTGGGCA 


AAAGATAGCT 


CATCAGATTG 


CACTTGAGCA 


TGTGCAAGCT 


GCTTTTCCAT 


780 


TTCTAATATG 


CTGTTATGTT 


GTGCAGTAAT 


GCGCTCGGCA 


AGACGCCCCC 


TTTCCAATGC 


840 


CTGCTGTTCT 


ACCAATAGCT 


GCCGTTCAGC 


CTGAATGTCA 


TCTTGTTGTG 


TAGACAACTG 


900 


ACGTTTTAAC 


TGGGAATTCT 


CCCAACTCTC 


GCTACAAGAT 


TTNCCCAAAC 


GACAAAAGAT 


960 


GTCTTGGACT 


TGTNTGGGTT 


ACACGAGCAT 


TTTCTGAGGA 


TTTTATACCA 


ATN 


1013 



(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 689 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 

GATATCCACA TCGAGACGTT TGAAAAGAGT CTGGTGATCC GTTTTCGTGT TGACGGCACA 60 

TTACATGAAA TGCTGCGTCC GGGGCGCAAA CTGGCCTCGC TGCTGGTGTC GCGTATCAAG 120 

GTGATGGCGC GGCTGGACAT TGCCGAAAAG CGCGTGCCGC AGSATGGACG TATTGCGCTG 180 

TTGCTGGGCG GCCGGGCGAT TGACGTGCGT GTATCAACCA TGCCTTCCGC CTGGGGGGAA 24 0 

CGGGTGGTGC TGCGACTGCT GGACAAAAAC CAGGCTCGCC TGACGCTGGA GCGTCTGGGT 300 

TTAAGTCTCG AACTGACTGC GCAGTTGCGC CACTGTTACA CAAACCGCAC GGCATTTTTC 3 60 

TGGTGACGGG GCCGACCGGT TCCGGCAAAA GCACCACGCT GTACGCTGGA TTGCAGGAGC 4 20 

TGAACAACCA CTCGCGTAAC ATTCTCACGG TTGAAGACCC TATCGAATAC ATGATTGAAG 4 80 

GGATCGGTCA GACGCAGGTT AACACCCGCG TCGGCATGAC ATTCGCCCGT GGCCTGCGCG 54 0 

CAATTTTGCG TCAGGACCCG GATGTGGTGA TGGTCSGTGA AATCCGCGAT ACCGAAACCG 600 

CAGAAATCGC TGTTCAGGCT TCAACTGGAC CGGACACCTG GGNACTTTCN ACGCTGGNAT 660 

ACCAAAAAAA AGGGGTGGGG GGATTATAC 68 9 



(2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1281 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 

CTCAGCAGAA CCGAGATCTT CCATCAGCTG GCGGGCCTCG GAAGANTCCC GCTGCCAGAC 60 

CGCATTCAGC CGCTGTTCAA ATTCGGCCTC GTCGATTTGC CTCAGCGTAA AGGGCGCGTT 120 

CAGCCCCCGT TGCAGCTCCT GCAAAACAGA GAGCGACAAC GGATGCACAT GGAGGATCTC 180 

CAGCGACGCT TCGCACCATG CCACCAGGCT AAACCGACGG CTGAAACTAT AGGGCAGACG 24 0 

CACGGTGTTA GCGGTGGTTT CCTGTGCTAC AGGCACCATT AACGCGTTCT CCCGGCATTA 300 

AGGAACGCAC GAACTTCTGG CGGTAAGGCC TGATTTTGCG CAGGCAATAT CGCTGCGCAG 360 

TGTGCGGCAT CAGGCTTAAG CCCTGCTCAT CGCGGTAGAT TTGCTCGGCG CGCATGTAGT 420 

TATATTTGCG CTGCGACACA CCGTCTGCCG CCATACCGTC ACGCAGAATG GTCGGGCGGA 4 80 

TAAACACCAT CAGGTTACGT TTTTCTTTTT TATCCGCCGT CGATTTAAAC AGGTTACCAA 5-10 

TCAACGGGAT ATCGCCCAGC AGCGGCACTT CTCGCCACGC TTTCTCCCGC CTGGTCGTCC 600 

ATCAGACCGC CAAGCACAAT TAGCTCACCA TCGTTAGCCA ACACGGTGGT TTTCAGTTTG 660 

CGCTCACCAA ACACCACGTC GAGGCTGGTC TGTCCTTCCA CCTTCGACAC TTCCTGCTCA 720 
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AT<:ACCATCT GTACCGCGTT TCCTTCGTTA ATCTGCGGCG TGACTTTCAG CATGATGCCG 780 

ACTTTTTTCC TCTCTAGCGT GTTGAAAGGA TTGCTGTTAT TGGAGCCAAC GGTAGATCCA 84 0 

GTTAATACCG GAACGTCCTG GCCCACCATG AAGAAGGCTT CCTGGTTGTC CAGCGTGGTG 900 

ATGCTCGGCG TGGAGAGCAC GTTCGAGCTG GAGTCGTTTT TGACCGCCTG TACCAGCGCC 960 

ATCCAGTCGC CTTTCAMCAC GCCAACCGCC GTACCGCTAA AGCCAGAAAG AAGCTGAGCA 1020 

AGCGTGGAGA GATCGCCGTT AGTATCCGGA TTTATGGTGG TAGCGCCGTT TTCACTGATC 1080 

ACCGTGGAGC CTTTCTGCGG TTTTGCYTGA GAAATCGTGC GCCCAGCGTA CCAATAGGGA 1140 

TCTGCGTACC GTTAGCAAAC TGCATTAATC CGGCATCTTT CGACGCCCAC TGCACGCCGA 1200 

AATTGATAAT TCACCTTCGG CAACTTCCAC GATCAACGCC TCGACATGTA CCTGAGCACG 12 60 

GCGAATATCC AGTTGTTCAA T 12 81 



(2) INFORMATION FOR SEQ ID NO: 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 421 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 

CAATATTAGC GCACGGCACC AAAGGTGATG AATGAGCAGG CTGRAATATT ATTTTCCCGC 60 

GGTGCAGAAA TCCTTGTTCT TGGTTGTACA GAAATTCCGG TTATTCTGGC GCAACGTTAA 120 

AGAGCAGCCT TCCCGCTATA TTGACTCACG GCGTCACTCG TTCGTGCCGG AATAAAATGG 180 

TACGAAAATC GTGTCGGTAA ACATTATCTT TTAACCCAAT AATCATTTAA ATCGCAGCCA 24 0 

GAAAGTTATT CG£TTTTAAC TGAATTATAT TTATAACGGA GAACATTATG GTTTGGCTGG 30 0 

AAATTATCGT AGTACTTGGT GCAATAKTTT TTGGTATTCG CCAGGGGGGA ATCGGTATTG 3 60 

GTTTATGTGG CGGGCTTGGG CTTGCCATTC TGACTCTGGG ACTTGGTCTG CCTATGGGGG 420 

G 421 



(2) INFORMATION FOR SEQ ID NO: 93: 

(l) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1018 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 
GTTAACAATG GCGTAACAAA TTTCAATAAC GTAGAAGATT TGCTGTCAGA AAGGTCAATA 60 
TTTCCTTTCA ATGGGTCAAA GACTTGCTTC TGGAATTCAT CCGGTTTTTT CTCCAGACGT 12 0 
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TTTCCTTCTT 


CATAATAGTC 


AATATAACTT 


TTACCACTGA 


GTGTTTTGKC 


YCCATTTCTG 


180 


GTGACACCAG 


CTAACTCACC 


TATCAGCGTA 


TCCCMATGTT 


GCTGGGTAAT 


GAGGACTGAT 


240 


CTTTCAACAG 


AATACTCTTT 


ATTATACTGA 


GATAATATTT 


TAAAGTTATC 


TTCTAAAAAT 


300 


GCAGCATGGC 


GGGCATCATA 


TCCCATTTTC 


AAAGTAATTT 


TTGCCGTGTT 


TTTTCTCCCA 


360 


TTCAGCAATA 


ACATCGGCCA 


TTTTACTGGC 


GACATGTTCA 


AACATTGCCT 


GTTTTGAAGC 


420 


CTCAAGGATG 


CCTGAAATTA 


TCCCCGTAAC 


AGCCCCTACC 


AGCGCGCTTA 


CCGGTGCACC 


480 


AACCAGAGAT 


GTCGTTGCAG 


CAGCACTAAT 


ACCTGAAGAT 


ACTGAAGCCA 


GAACAGTGCT 


540 


TATCGTTGTT 


AACGATGCAT 


CAATAGCTCC 


TGTTTCTTTG 


TGGAAAGCAG 


CAAGTAAACT 


600 


GTCACCATCG 


TATCCAAGTT 


TTTTGAATCG 


TTGTGAATAC 


TCCTCTATTT 


TATTGGCACG 


660 


TTTAAACTTA 


TCGGCAATGG 


ACAGGAATGA 


GAGGGGACTA 


ATTGCCAGTG 


TCACAACAGA 


720 


AGCAATTAAA 


CCGGCAGCAG 


CAGCAGATGT 


AGATAACCCC 


TGTGCTGCAC 


GCTGTGCGAY 


780 


NAATATATTG 


AGAAATACCT 


TTTCCAACAT 


TACCCAGTAC 


TTTCGTTGTT 


AATTCAACAC 


840 


CTGCTGCAGC 


TTTAGTTCCG 


GTATCTGCAT 


CTGCATTGCT 


CAGAATGAAA 


CTTGCTGAAA 


900 


TCGCAGATAA 


AATACCCGAT 


ACAGTATCTA 


ACCCTGCACC 


GATATTATCA 


AGGTTAGGTA 


960 


AATTCTGTAA 


CTTATTACCA 


ACACCGTTCN 


GGNCTGTTGG 


TATTGGGATA 


ATACACTT 


1018 



(2) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 

GGCAATGTTC AAATCGATAT TGTGCAGCAC CTGGGTTGGG CCAAAGTGCT TGGAGACGTT 60 

TTTAAATTCA ATCACAGGAT TTTCATCCTT CTTTCCAGAC GACGCAGAAT AAAGCTCAGC 120 

ACCAGGGTAA T AAT C AG AT A GAACACCGCC ACGGCGCTCC AGATCTCAAG GGCGCGGAAG 180 

TTACCGGCAA TAATTTCTTG CCCCTGACGG GTCAGTTCCG CCACGCCGAT CACAATAAAC 24 0 

AGCGAGGTGT CTTTAATGCT GATGATCCAC TGGTTACCCA GCGGCGGCAG CATACGACGC 3 00 

GTGCCAGCGG TAAAATGACG TAGCGAATGG TTTCCCMACG TGAAAGACCG AGCGCCAGTC 3 60 

CTGCTTCACG AAAACCTTTG TGGATAGACA GCACCGCACC 4 00 



(2) INFORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1857 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 



CGTGTTCCCC 


TGGCCNGCTT 


GGTTTCGCCA 


TAGACGTTGA 


GCGGGGAAAT 


CACATCGGTT 


60 


TCCACCCAAG 


GACGTTCACC 


ACTTCCATCG 


AAAACATAGT 


CGGTGGAATA 


ATGTACTAGC 


120 


CACGCACCTA 


ATGCTTCAGC 


TTCTTTGGCA 


ATAACCGCCA 


CACTAGTTGC 


ATTGAGTAAC 


180 


TCGGCAAATT 


CCCGCTCACT 


CTCCGCTTTG 


TCGACTGCAG 


TATGGGCCGC 


TGCGTTAACA 


240 


ATCACATCCG 


GCTTGACGAG 


ACGTACCGTT 


TCAGCCACCC 


CTGCAGAATT 


GCTAAAATCA 


300 


CCGCAATAGT 


CGGTGGAGTC 


AAAATCAACG 


GCAGTGATGT 


GCCCCAGAGG 


CGCCAATGCA 


360 


CGCTGCAGCT 


CCCATCCTAC 


CTGACCATTT 


T T G C C AAAC A 


ACAGAATATG 


CATCAGGTAC 


420 


GCTCCCTATA 


GTTTTGTTCA 


ATCCAGGATT 


GGTAGGCACC 


ACTCTTGACG 


TTGTTAATCC 


480 


ATTGTTGATT 


ATCCAGATAC 


CACTGCACGG 


TCTTGCGAAT 


ACCAGACTCA 


AAAGTCTCCT 


540 


CTGGCTGCCA 


ATCCAACGCA 


GCGCTCATCT 


TGCAAGCATC 


AATCGCATAT 


CGGCGATCGT 


600 


GTCCGGGGCG 


ATCCGCCACA 


TAAGTAATTT 


GATCGCGATA 


AGAGCCAGCT 


TTCGGTACCA 


660 


TCTCGTCAAG 


CAGATCACAA 


ATAGTATGTA 


CTACATCCAG 


GTTCTGCTTC 


TCGTTGTGAC 


720 


CGCCTATGTT 


ATAAGTCTCC 


CCGACCAAGC 


CAGTGGTCAC 


TACCTTGTAG 


AGTGCTCGTG 


780 


CATGATCTTC 


CACATACAAC 


CAGTCACGAA 


TTTGGTCACC 


TTTACCATAA 


ACCGGCAGCG 


840 


GCTTGCCATC 


CAGCGCATTG 


AGGATCACTA 


GCGGGATCAG 


CTTCTCGGGA 


AAGTGGTAAG 


900 


GGCCATAGTT 


GTTGGAGCAG 


TTAGTGACAA 


TGGTTGGCAG 


GCCGTACGTA 


CGGTACCAAG 


960 


CACGCACCAG 


ATGATCGCTG 


GAAGCCTTGG 


AGGCAGAATA 


GGGACTGCTA 


GGAGCGTAGG 


1020 


AGGTAGTTTC 


GGTAAAGAGC 


GGCAATGCCT 


CACCGGAGGC 


TACTTCATCC 


GGATGGGGCA 


1080 


GATCGCCATA 


TACTTCATCG 


GTAGAAATAT 


GGTGGAAGCG 


AAAGGCCGCC 


TTGCTCAACT 


1140 


CGCCCAGACT 


GCTCCAATAG 


GCGCGAGCCG 


CTTCCAGCAA 


TGTATAGGTG 


CCTACGATAT 


1200 


TGGTTTCGAT 


AAAGTCGGCT 


GGCCCTGTGA 


TAGAACGATC 


AACATGGCTT 


TCAGCAGCCA 


1260 


GATGCATCAC 


GGCATCTGGC 


TGGTGCAGAG 


CAAACACCCG 


ATCCAACTCA 


GCACGATTAC 


1320 


AGATATCAAC 


TTGTTCAAAC 


GAATAACGCT 


CACTTGACGA 


TACACTGGCC 


AAAGATTCCA 


1380 


AATTGCCAGC 


ATAGGTGAGT 


TTATCCAGAT 


TGATAACGGA 


GTCTCCAGTA 


TCACTAATGA 


1440 


TATGACGCAC 


CACGGCAGAG 


CCGANAAAAC 


CAGCACCGCC 


AGTAACGAGA 


ATCTTCATAT 


1500 


ATTTCGCTCT 


CTTATTTTAC 


AATTAATAGC 


TATTAAAAAT 


AAACTTGTTG 


ACTCCGATAT 


1560 


ATTAGAAATA 


TCGGGATACC 


GAACTAAATA 


TTTTTATATG 


CTTTTGCCAA 


GCAGACTCTA 


1620 


TATCCACCCT 


GTATCACTAT 


GCT7TCTGGC 


ATACAATATC 


CCATCATTGA 


CACAATGATA 


1680 


AACATATAAA 


TAAAGAAAAT 


TTTAAATCAT 


ATAACCAAAT 


TACTTTCATT 


TATTATCAAT 


1740 


AAGTATTTTG 


ATAAGAATAC 


CTATACCACA 


GGGAGCCCCC 


TGAAACATAA 


TATTAGCGAA 


1800 
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GAATGATAAC TGATAGTTAC CATCTTAGAG ATAAAAACTT ATTTGTGTGG CGGGATG 18 57 

(2) INFORMATION FOR SEQ ID NO: 96: 

£i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 



AGCTCTTTCG 


TGTAAAATAA 


AATACAGCAT 


ATCCTATATA 


GCTTACAATC 


ATTAAATGAA 


60 


GTCGCCAATA 


TTTATATGTT 


TTATCAATAT 


CAGCTTGACT 


CATTGTTATT 


TCTTTGTCAG 


120 


GAGACTCTGA 


AAATATGGAC 


ATATATAACC 


TCTTTTATTA 


TGAAATATTT 


TCAATAATAA 


180 


TAATCCGTTA 


GTAATCCTAT 


CATAGGGTAA 


TGTCTCATCA 


TGTTAAAATG 


ATCACATTTA 


240 


TAATCATGTC 


AAAAAGAACA 


ACAGAAAAAA 


TCATATAAAA 


TCAATTAAAT 


ATAATTGCCA 


300 


CATATTGTTG 


TTATTWAAAC 


ATTGGTGGTG 


AATTTAAAGC 


GAGAACAGTT 


TGTAACAGTG 


360 


ACTCCTTGCA 


GACTAAGTTA 


GAGTCTCCTT 


CTAAAATTAG 


ACGGWKTTCT 


ATTGATGGAT 


420 


AATAGTAAGC 


GCACCGTGAA 


KGACGTGGGG 


TAAAAATTAG 


TTTACAGATT 


GAGTGACATT 


480 


CCAGGGCAAC 


AACTCTTTCA 


CGCGGTTGGC 


AGGCCAGGTG 


TTGATTACAC 


TGATCACGTG 


540 


GCGTACATTA 


CCGGACTCGA 


TTCCGTTAAG 


TTTGCAGCTA 


CCGATCAGGC 


TGTACATCAC 


600 


TGCCGCACTC 


TCGCCTCCAC 


CATCAGAGCC 


GAAGAACATG 


TAGTTACGCC 


GCCCCAGTGC 


660 


AATACCCGGA 


GGCGTTTTCA 


CACAGGTTAT 


TGTCGATCTC 


CACCCAGCCA 


TTGCGGCAGT 


720 


ATTCGTTCAG 


AGCGTCCCAT 


TGCTTCAGCA 


GATAGGTGAA 


CGCTTTCGCT 


GTATCCGAGT 


7R0 


GGCGCGACAG 


TGCTCATCTG 


CCCCTGGAGC 


CACTCATACA 


ACGACTGCAT 


TAGCGGTACC 


840 


GTTCTGGCTT 


TTCTGACCGC 


CAGTCGCTCT 


TCTGCCGGAC 


TGCCGCGGAT 


CTCAGCCTCG 


900 


ATAGCGTACA 


GTTCACCGAT 


ACGCTGCAGG 


GCTTCCGTGG 


TGATGTCAGG 


TGGCGCTCTT 


960 


GCATGCACAT 


CGTGGATTTT 


TCTCCGGGCA 


TGGGCCATAC 


AAGCCGCTTC 


GGTTACCTGA 


1020 


CCGCTTTCGT 


AAAGAGCATT 


GTAACCCGCA 


TATGCATCGG 


CCTGCAGGAT 


ACCTCTGTAG 


1080 


TCCGCCAGAT 


GTTGCTGTGG 


GTGGATGCCT 


TTGCGGTCGG 


GAG AG TAT 




1128 



(2) INFORMATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 439 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 
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GTTTGCTTAC GAACCGTGAA ATATGACGGT CCCATATAAC TGCCTGATAC TTGTATATCA 60 

TATACTTGTG CATGCATGTC ATCATTAAAA AGTACTTTGT CACCGTCTTT AAGTTGAAGA 120 

CGTGTAAAAT CTTTATACGG CAAGTAGACG GAAAACGGGC GCTTTCCCTG TCGCCAATCA 180 

CACCGACATG ACTGACTTTT GCGAGAGGAA GTGCATAATT CACGAATTCA GAGCCTAATG 24 0 

CATTGCGCTG GGTAAGCTCA AATCGGAATG GGTTTCGAAC CTTTCCCGCA ACATTGATCA 300 

TTGGACCTTG TTGCTCAACT GAAAATCACA TCTTGATCTT TTAATGCCAG CTTCGGGAGT 3 60 

TTCCCATACC GTATGAAATC ATAAAGATCA ATTTGCKGTG NTTACTGCTA TTTTGTGCGT 4 20 

GAACACCTTA ATTTTTGCG 4 39 



(2) INFORMATION FOR SEQ ID NO: 98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 906 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 
TATTCGTAAT TAGTTATAAA CAGATGATGT AAACACCAGT TGACTAGAGT CAATCTTATA 60 

CTGGCAACAT CTATGATTAA TTTGTGTGGT TATAATTTTA AATATCTTAT ATTTATGGGC 120 

TATTATTGAT ATCTGTCAGA GTATCAATAA TAGAAGGTAA TTGTTTTACA TACTATCAAC 180 

TTTTGGATAA CGTTTTAAAA TGCACCTTGC ACATCGTATT TTATTATTTT CACTAATCTT 24 0 

TTTTATAACG GCCTGCGCAC ATGATCCAAA ACAAGTTGAA GCCTCTCGTC CATTGGTAAC 300 

AGCGATTAAT TCTTCTTATT CTCTTATTCC TGAAGATTTG CAGGCACCAT TAAATAACCA 3 60 

AGATCAAGGC ACGACATTCA ACAAAAATGG CGTAATTTAT ACTATTGAGG AAAGGTATAT 4 20 

ATCGGCTTTA GGTTCTCAAT GCATAAAGTT AAGTTATGCG ATGAATAAAA ATTATTCAAA 4 80 

GCGAAGTGTT GTATGTAAAG AGAATAACAA GTGGTATCAA GTACCTCAGT TGGAACAAAC 54 0 

ATCAGTTAGC ACTTTGCTTA TTGAAGAATA AAGTTGAAGG TAGACGGTTA GAAAATAATG 600 

AAAATTTCGC AACTTAGCAC TCTTCTCTTT CTTATTTCTG CATCAGCATT CGCCGCAATA 6 60 

GAGCAAAATC AATCTAATGG TTCACATTTA GATTATGATC TTGCTGCCTC GACAGGAGAG 720 

TCTCGGAAAA TGCTAGCAGA CATCACTGGA CAGCCTAATA CAACCTCCAC AACAGGAAGC 7 80 

TTCACACAAC AGAATCGTAA TGGGATGTTG CTTCCAGGAG AGTCAGATGT ACGAAAATTA 84 0 

CTGCCGCAAT CTGAAGCAGG CTTACCTCCT CCGTATGGTG CTAATTTATT TGCCGGAGGC 900 

TATGAA 90 6 
(2) INFORMATION FOR SEQ ID NO: 99: 
(i) SEQUENCE CHARACTERISTICS: 
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( A) LENGTH: 1395 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

( D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 

GCGGCCTGAT ATATGCCGTT ATTACAAAAA GAGGATCAAC CACACTGCCT TTTGGACCGT 60 

GTTTAAGTCT GGGCGGTATA GCAACACTTT ATCTACAGGC ATTGTTTTAA TGATAACCAC 120 

GTCATTATCA AAGTGACATT TTAACTCTTA TTAATAACCT TAGAGATTAT TTACCATGTC 180 

GATAAAACAA ATGCCAGGGA GGGTATTAAT ATCGCTATTG TTGAGCGTTA CAGGATTATT 24 0 

AAGTGGCTGT GCCAGCCATA ATGAAAATGC CAGTTTACTG GCGAAAAAAC AGGCGCAAAA 300 

TATCAGCCAA AACCTGCCGA TTAAATCTGC GGGATATACC TTAGTGCTGG CGCAAAGTAG 3 60 

TGGCACGACG GTAAAAATGA CCATTATCAG CGAATCGGGT ACTCAGACCA CGCAGACACC 420 

TGACGCCTTT TTAACCAGCT ATCAACGACA AATGTGCGCT GACCCAACGG TGAAATTAAT 4 80 

GATCACCGAG GGAATTAATT ACAGCATAAC GATTAATGAT ACACGTACAG GTAACCAGTA 54 0 

TCAGCGGAAA CTGGATCGTA CCACCTGTGG AATAGTCAAA GCATAACGTC GGGTAGATAT 600 

AAATTGGCGC GGGTTGTTTT TCGTGACGCA CGAATTTATC TCATTCAATG GCTGACAAAA 660 

ATTCGTCACA CTCTTAACCA GAGACAATCT CTTAATACAG ACAAAGAGCA TCTGCGCAAA 720 

ATTGCACGCG GGATGTTCTG GCTGATGCTG CTTATTATTT CTGCAAAAGT GGCGCATTCA 780 

CTCTGGCGCT ATTTCTCCTT TTCTGCGGAA TATACGGCGG TTTCCCCATC GGCGAATAAA 84 0 

CCGCTCCGTG CGRATGCAAA AGCGTTCGAT AAAAATGACG TGCAATTAAT CAGCCAGCAA 900 

AACTGGTTTG GCAAATATCA GCCCGTCGCC ACGCCGGTAA AACAACCCGA ACCTGCACCT 9 60 

GTGGCCGAAA CGCGTCTTRR TGTGGTGTTG CGTGGGATCG CCTTTGGTGC CAGACCCGGC 1020 

GCGGTTATTG AAGAAGGTGG TAAACAGCAG GTCTATTTGC AGGGTGAACG CTTGGCTCGC 1080 

ACAACGCAGT GATTGAGGAA ATCAACCGCG ACCATGTGAT NTGCGCTATC AGGGAAAAAT 1140 

AGAGCGCCTG AGCCTGGCTG AAGAGGAGCG TTCCACCGTT GCCGCGACCA ACAAAAAAGC 12 00 

TGTCAGTGAC GAAGCAAAGC AAGCTGTTGC TGAACCTGCT GTCAGTGCGC CAGTTGAGAT 12 60 

CCCNGCTGCC GTGCGTCAGG CACTGGCGAA AGATCCGCAG AAAATTTTTA ACTATATCCA 1320 

GCTTACGCCT GTGCGTAAGG AAGGGATTGT CGGTTATGCA GTGAAACCGG GGGCAGATCG 1380 

TTCTCTGTTC GATGC 13 95 
(2) INFORMATION FOR SEQ ID NO: 100: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 380 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 



iNSDOCID <WO__ 9822575A3 IA> 



WO 98/22575 



PCT/US97/21347 



-217- 



( D } TOPOLOGY; linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: 



CACTTGAATA AAACTGACAC CGTTTACCTC CATAATAGTG AGCATAGCCG CCATTGCGGC 



60 



CTGATCGGCG AACCGGAAAT CGCAACCTGC GAACGACAAC CGAACCGGCA AGCGTGCGGG 



120 



AAGGACGGAT ACCGGACTCT TTCGCCACTT CAGCAATCAC CGGCAGCGTG GAAAAAACAA 



180 



TAAACCCAGT ACCGGCCATA ATGGTCATAG ACCAGGTGAT AATCGGCGCG ATTATGTTGA 



240 



TATATTTCGG GTTACGCCGC ATAAAATTAC CAGCGACGGT ACCAGATAAT CCATTCCCCT 



300 



GCGGCCTGTA AGGCTGAGGC CGCCACAACA ACGGTCATAA TAATCAGGAT CACGTCGACT 



360 



GGCGGCGACC CCATAGGCAG 



380 



(2) INFORMATION FOR SEQ ID NO: 101: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 995 base pairs 

( B ) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 

CTTTACGGTT TAATAGGGGA ANGCCGACTG GATGNAAAAA TGGAATCTGG AGCCCAGAAT 60 

AAATCTGAAT TTAATGTGGA CTGGATATGC TCCAATAACC CCGGCAGGGA GTCATCTGTG 120 

CGAAGATATT TGCGTTATGC TGTAATATAA TAATTCAATG TATTTCAGGA AC AG T AAT AT 180 

ACTACAGTTT CTACTTTCTT GTATTTAATA AATTGTTCCG CATCGCTAAA AGCAGGTCTT 24 0 

TCAGAAGCCA CAAGAATTCT GTGGTCCCAG TATTTTTAGT TATCCTATTT TTATATCTAA 300 

CTTGTAATAC TTACAGCATT TTCATTCATC CTAATGGAAG GCTGTAATAA TCTTTGAGCT 3 60 

TAGAAACATC AAAATTATGC ATCTCATTAA TTTTGTCAGT CACACGACCT CTGGTAAAAA 4 20 

TAAAACCCCC AGAAATATGC CATTTCTAGG GGGGGCGTAA GAATCAATAT ATTTTAGTGT 4 80 

TGTTACATTT AGCTCTTAGC TCTTAGCTCT TAGCTCTTAG CTCTTAGCTC TTAGCGTTTG 54 0 

TAGTTTCATC GCAATGAGTA AAAGGACAAC AAGAATAAGT GATAACGTTA AG AG AAG AG C 600 

ATAGAAACCA TTCCAGTGGT ATATTTCTAT TATTTTAGAC AATGGATAGC CAGCCGCGGA 6 60 

CGCACCAAGA TATGCGAATA AACTAACAAA ACCAGTAGAA GCACCAGATG CATATTTATG 720 

TGAGTTTTCA GCAGCTGCCA TTGCGATCAG AAATTGTGGC CCAAAGATAA AGAAGCCAGT 780 

GATGAAAAAT AATAACGAAA AAACATATTT ACTATCAATA GAAACCAACC ATAGACATGC 84 0 

AGAAGCAATG ATTATACCAA TTGTATAAAT AACATTCATT TGAGAGCGAT TGCCCTTAAA 900 

CAGAATATCT GATCCCCATC CAGCTACGAT AGCACCAAAA AAGCCTCCAA CCTCAAACAT 9 60 
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CATTACTGTT GCATTTGCTG T T A G C AA G T C ATATT 935 
(2) INFORMATION! FOR SEQ ID V,'J: 102: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 817 case pairs 

(B) TYFE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ I D NO : 102: 

TAAAAGCGAC TCCATGTGAA ATTTCTGTTT GTCGTTTTTT CCCCGTTGTA GCGGCTCTGC 60 

TCCTGGCTTC CCTGATAGTC AGCCCGCAGG CGCCAGGGCC CCAGATTCCC CCCCACAGTC 120 

CCGTTATAAC TGAACTGATG AGAGTCTCCT CCCTGATAAT TACGGGAAAC CGTCCCGTTG 180 

AGGTTATAAT CCAGCATCAG TCCGGGAATG CCGTCGTCCC AGCGTGAGGG AGGCAGCCAG 24 0 

GTGGCATCAG AATACTCAAG CCAGGCCTGC GGCATATTGA TGCGTAATAC GCCCGCTCCG 300 

GTATCAGGAC GAATATCCAC TCCCGGCAAC CCATGAAAAT CCGCACACTG ACCATCATGC 3 60 

CAGTAAACAA CTTTATCCAG AGATTCTGCT GTTAACCCCA TCAGTCTGAC CATATCTGAT 420 

GTCAGACAGC TGCGGCAATT TTTTTTCTGC CTTATCTCCT GACAACGCAG GTTCAACAAA 4 80 

TGAMATCTGT AACGATGCGG GAGAAATACT TTGCCCGTTA ACAATCACAT CCAGAAGATA 54 0 

TTGCCCCGGC AGAACATAGC CGGCTTCTGA AAAACGGGTG AAGTCAATAT TTTTCTTGTC 600 

CGCTGCGTCA AGTACATCTG TATTAAACTC AACGGCACTG GCTGCGTTAC AAAACAGAGA 660 

CAACAATATC ACACAGGTAA TATTGTTGAC TGCAAAAGGT ATTCTGTCTT TCATTCCACG 720 

CATCACCAGA TTCACAAAAA AGATAAATAA CCGGACATCT CACCGGAGTG ACTCACTCAT 78 0 

AATCGACCCG GAATCCCAGC ACAGCAAAAT AATTTCC 817 



(2) INFORMATION FOR SEQ ID NO: 103: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 709 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 

TTTTTGTCAG AGCGTTCACT CTCTGGCTGG ATGATTTCGG CTCGGGAAAT GCAGGCTTAA 60 

TGTGGGGACT GTCGGGGATG TTTGAACGGG TAAAAATAAG TCATGAGTTT TTTCATTATG 120 

TCCTGAAAAA CGGGTGTGCA ATGCCACTTC TCCGTGCTGT GGCAGACACT GTTGCCTGTC 180 

ACAACAGAGG CGTGATACTC GAAGGTGTTG AAAATGAAGC GTTGTTCCGT ATTGCCAGAG 24 0- 

ACATGAATGT CCAGGGCTGT CAGGGATGGC TCTACAGGCG TGTGGGGGTT GATGAATTAT 300 
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CCGCGCTTAT TCAGCAGTAT GAATAATCCT TTTTCACAGA CTGGTCAGCT GTCAACATTT 



360 



ATGTTTTTTT ATCTGCGGGA ATTTATCCGT CTGCCTGTCG GGACTACTCT GTCATACAGA 



420 



AATCAGGCCA GAATAAATTG TTGTGGAAAG GTGAGATTTA CCGGATGACT GATGTGCTCT 



480 



TGTGCACAGG TATACAGGCA GTGTGTTTCC AGTATATGGA AAATGATTAA ATGAATAACA 



540 



CAGACTTATT AGAAAAAATC ATCAGGCATC AACAAAACAA AGATCCTGCA TATCCTTTCC 



600 



GGGAACATGT TTTGATGCAA CTCTGTATCC GTGTAAACAA AAAAATACAG AACAGTACAT 



660 



CTGAGTTTTT TGGTGCATAT GGTATAAATC ACTCAGTATA TATGGTTCT 



709 



(2) INFORMATION FOR SEQ ID NO: 104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 485 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

£xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 

TCATCAAGGG ACGGGGCATA TCTGGATGCG ACAGGGCAAA CCAACCACTG AGAATCCAAC 60 

CTGCCAAAGC CTGACCAGGA AGTCCGACGT TAAAGAAACC AGCTCGACTG GCAACGGCAA 120 

AACCAAGACC AATCAAGACC AGAGGACCCA TAGCACGGAA GATTTCTCCA ATCCCACGCA 180 

GACTGCCAAA GGCTGTATAG AACAATTCTT CGTAGCCCCA AATAGCATCA TAACCGAAGA 24 0 

TCCACATGAC AATGGCTCCG AGTAAAATTC CTAGGAATAC AGAAATCAAG GGAACCGAAA 300 

TTTGTTGTAA TTTTTTAGAC ATCACTCTTC TCCTTTCCCA AGTTYCCACC AGCCATCAAG 360 

ACACCAAGTT CTTGTTTATT GGTTGTTTCT GGTGATACAA TACCTTGAAT CTTACCATCG 4 20 

TGGATAACGG CAATACGGTC TGAGACGTTT AAAATCTCAT CCAATTCAAA GCTGACNACA 4 80 

AGGAC 4 8 5 
(2) INFORMATION FOR SEQ ID NO: 105: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 459 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRI PTION : SEQ ID NO: 105: 

AGCAGAATAG GCAACATCAC CACGCCGACA AACAGCGAGA AGAGAATGAC GCCAGCCGCC 60 

AGGAACACCA GCTCATAGCG CGCCGGGAAG ACGTTACCAT CCGGCAAGAG CAGCGGGATA 120 

GAGAGCACAC CGGCCAGAGT GATCGCCCCA CGCACCCCGG CGAAAGACGC GATCAGGATT 180 

TCTCGTGTGG TCCACGAACC AAACTCCATC GGCTTCTTCT TCAGGAAGCG GTTGCTGAAC 24 0 
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TTTTTCATCG TCCACAGCCA G C C G AAA C G G ACCAGCATCA GCGCCGCATA TATCAGAATA 30 0 

ATATTGGTAA ACAGCATCGA GATTTCGACG TTAGGGTCGA TTTCTTGCTG GCCATCAGCG 360 

GACGTCTTCC AGRATTACCC GGCAGCTGCA GAGGTTAACA GGAGGGAACA CCATGGCCGT 4 20 

TTTAAGGACA ATTTCNAGCA TGGGGCCANG TGGTGTTTT 4 59 



(2) INFORMATION FOR SEQ I D NO : 106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 908 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION!: SEQ ID NO: 106: 

TTAATAGCAC TAATACTGTC CTGCTCTATT CCGCTGACAT TTTCAGTCAG CTGCTGTATG 60 

GGATGGGTTA CCCAAAACCA GACCAGCATA CCTGACAAGA GACCGCATAT CACTACCAGA 120 

AACAGCGACC AGTACAGTGC ATTCCATAGT GCCTTTGTCC AGGCTGTATC AGTAAGAGCA 180 

TTAAGTTCCT CTCCCTGTAA AATAATATAC AGATATCCTT TCGGTTCATC ACTCTGGTAA 24 0 

AGCGGTGCGG TACTGAAAAC TTTTTGCTTA TTTACACTTC GGGGATCATC ACCATATACG 300 

GGCCAGACAC TGCCGGAGAG AAATTTTTTC AACGGTGCAA TATTGATATA CCGGCGTTTG 360 

AGATGACCCG GAGGGCGGCC TCCACAAGCA GTCGCCCTTC CGGTGAAACC ATATACAGCT 420 

CCACACTGGG ATTAAGCGTC ATCAGACGCT CAAACAGACT CGTTAATGTC CGGTGTTACC 4 80 

AGACAAAACA AGCATCGCAA GACGCCACAA ACGGTGCGCT TACTTAAATA AGCCGGTTAC 54 0 

AGGTGAAAAA TCACGTCCTG ATATTCAAAT GTTTTTTCAG GTCATATTTT AGCAGGACAC 600 

TACCAGCACC TAACAGCAGC ACATCTTTTA TAACAAAACT GTCAACTTTC CCCAGTTGTG 660 

GTAACAGGCT GAGCGTGGTT ATTCCTGTAA CAATAACGAT AATATCTCCC AGTACACCAG 720 

CAGCAGGCCT GAAGAAACCG ATAATCAATG CCAGAAATGT GATAGTTTCC ACTATGCCGA 7 80 

GGAAATAGCT CCCTCCATGA ATACCAAATA TAATATACAG GATATTCAGC CAGGTGGGAT 84 0 

ATATCAGGGG CTTGAGAGCC ATAACTTCAA AATCAAACCA TTTATAAGTC CCAAAAAGCA 900 

TAAATATT 908 



(2) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1057 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 
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CGGGCTAACC 


CAATATGCTT 


TATTAACCCG 


GGATAATTAC 


CCTGTTGCAT 


ATTGTAGTTG 


60 


GGCTAATTTA 


AGTTTAGAAA 


ATGAAATNAA 


ATATCTTAAT 


GATGTTACTT 


CATTAGTCGC 


120 


AGAAGACTGG 


ACTTCTGGTG 


ATCGTAAATG 


GTTCATTGAC 


TGGATTGCTC 


CTTTCGGGGA 


180 


TAACGGTGCC 


CTGTACAAAT 


ATATGCGAAA 


AAAATTCCCT 


GATGAACTAT 


TCAGAGCCAT 


240 


CAGGGTGGAT 


CCCAAAACTC 


ATGTTGGTAA 


AGTATCAGAA 


TTTCACGGAG 


GTAAAATTGA 


300 


TAAACAGTTA 


GCGAATAAAA 


TTTTTAAACA 


ATATCACCAC 


GAGTTAATAA 


CTGAAGTAAA 


360 


AAACAAGTCA 


GATTTCAATT 


TTTCATTAAC 


AGGTTAAGAG 


GTAATTAAAT 


GCCAACAATA 


420 


ACCGCTGCAC 


AAATTAAAAG 


CACACTGCAG 


TCTGCAAAGC 


AATCCGCTGC 


AAATAAATTG 


480 


CACTCAGCAG 


GACAAAGCAC 


GAAAGATGCA 


TTAAAAAAAG 


CAGCAGAGCA 


AACCCGCAAT 


540 


GCGGAAAACA 


GACTCATTTT 


ACTTATCCCT 


AAAGATTATA 


AAGGGCAGGG 


TTCAAGCCTT 


600 


AATGACCTTG 


TCAGGACGGC 


AGATGAACTG 


GGAATTGAAG 


TCCAGTATGA 


TGAAAAGAAT 


660 


GGCACGGCAA 


TTACTAAACA 


GGTATTCGGC 


ACAGCAGAGA 


AACTCATTGG 


CCTCACCGAA 


720 






Af*C APAATTA 


GACAAATTAC 


TGCAAAAGTA 


TCAAAAAGCG 


780 


GGTAATAAAT 


TAGGCGGCAG 


TGCTGAAAAT 


ATAGGTGATA 


ACTTAGGAAA 


GGCAGGCAGT 


840 


GTACTGTCAA 


CGTTTCAAAA 


TTTTCTGGGT 


ACTGCACTTT 


CCTCAATGAA 


AATAGACGAA 


900 


CTGATAAAGA 


AACAAAAATC 


TGGTGGCAAT 


GTCAGTTCTT 


CTGAACTGGG 


CAAAAGCGAG 


960 


TATTGAGCTA 


ATCAACCAAC 


TCGTGGGACA 


CAGCTGGCCA 


GCCTTTAATA 


ATAATGTTNA 


1020 


ACTCATTTTC 


TCAACAACTC 


AATAAGCTGG 


GGAAGTG 






1057 



(2) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 752 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: 



TACCGGGCCC 


CCCCTCGAGG 


TCGACGGTAT 


CGATAAGCTT 


GATATCGAAT 


TCCTGCAGCC 


60 


CGGGGGATCC 


ACTAGTTCTA 


GAGCGGCCGC 


CACCGCGGTG 


GAGCTCCAGC 


TTTTGTTCCC 


120 


TTTAGTGAGG 


GTTAATTTCG 


AGCTTGGCGT 


AATCATGGTC 


ATAGCTGTTT 


CCTGTGTGAA 


180 


ATTGTTATCC 


GCTCACAATT 


CCACACAA3A 


TACGAGCCGG 


AAGCATAAAG 


TGTAAAGCCT 


240 


GGGGTGCCTA 


ATGAGTGAGC 


TAACTCACAT 


TAATTGCGTT 


GCGCTCACTG 


CCCGCTTTCC 


300 


AGTCGGGAAA 


CCTGTCGTGC 


CAGCTGCATT 


AATGAATCGG 


CCAACGCGCG 


GGGAGAGGCG 


360 


GTTTGCGTAT 


TGGGCGCTCT 


TCCGCTTCCT 


CGCTCACTGA 


CTCGCTGCGC 


TCGGTCGTTC 


420 * 


GGCTGCGGCG 


AGCGGTATCA 


GCTCACTCAA 


AGGCGGTAAT 


ACGGTTATCC 


ACAGAATCAG 


480 
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GGGATAACGC AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA AAAGGCCAGG AACCGTAAAA 



5-30 



AGGCCGCGTT GCTGGCGTTT TTCCATAGGC TCCGCCCCCT GACGAGCATC ACAAAAATCG 



600 



ACGCTGAAGT CAGAGGTGGC GAAACCCGAC AGGACTATAA AGATACCAGG CGTTTCCCCC 



660 



TGGAAGCTCC CTCGTGCGCT CTCCTGTTTC CGACCCTGCC GCTTTACCGG ATANCTGTNC 



720 



GGCTTTCTCC CTTCGGGAAG CGTGGCGCTT TC 



752 



(2) INFORMATION FOR 3EQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 486 base pairs 

(B) TYPE: nucleic acid 

(C) 3TRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 

CTTGGGTAAT NGACCTCATA TCCCTCCGCC AAAAAAGGAT CTACATGCGA TTTTGCGAAG 60 

CCAGCGTTGA TTGTAGGCGA GAGAATGGTT CTGTTGTTTT GGTACATTTC AGTTGTCATG 120 

GATTTCACAA AT G TAG CAT G ACCTTTCACC TGTCCAAGAG ACTGCAACAC CATCTGTCCA 180 

AAACAATAAA TAGGAATCAA ACAGGCTACC AACATCAACA AGTATCCCAA TAAGGCTCGT 24 0 

AGTTTAGTCC TTGACATGAC GCCCCTCCAA TTGCTTTTCT AGTCCTTTGA CAATCCGTCG 300 

ATTACGATAC ACGCGATACA GCAAGAGAAG GATGACCGCC ATCGCTCCTA GTAATAACCA 3 60 

CAACCAGAAT TGCCCACGCT CTCTCACCGC TCGATTCCGC TCTGCAATTG GTGCCGTATA 4 20 

CGGAATCCGC TTCCCACGTA CCAACAGACG ATGACTGTTA ATCCTATACG GTGTACNAGT 4 80 

CAACCA 4 86 
(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 313 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 

TTACGCNTTC AACCAGGTCT TCTGGTTTAC CAACGCCCAT CAGGTAACGC GGTTTGTCTG 60 

CCGGAATTTG CGGGCATACA TGCTCCAGAA TGCGGTGCAT ATCTGCTTTC GGCTCACCCA 120 

CAGCCAGACC GCCGACAGCG TACCATCAAA ACCGATATCT ACCAGACCTT TAACAGAAAT 180 

ATCACGTAAA TCTTCGTAAA CGCTGCCCTG GATGATACCA AACAGCGCAT TTTTGTTTCC 24 0 

GAGACTGTCA AAACGCTCAC GGCTACGTCG CCCAACGCAG AGACATCTCC ATGGAGCGTT 300 

TTGCGTAATC CCA 313 
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(2) INFORMATION FOR SEQ ID NO: 111: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1613 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 



CGGAAATCCC 


AGTAATTCCA 


TCCTCANATA 


TTCCACTCAN 


CCTCACTGTA 


ACAAAGTTTC 


60 


TTCGAATAAT 


AAAAATCATG 


CTTTCTGTTA 


TCAACGGAAA 


GGTATTTTTA 


TTCTCTGTGT 


120 


TTGCTTTATT 


TGTGAAATTT 


AGTGAATTTG 


CTTTTTGTTG 


GCTTTATNTG 


ATGTGTGTCA 


180 


CATTTTGTGT 


GTTATTTTTC 


TGTGAAAAGA 


AAGTCCGTAA 


AAATGCATTT 


AGACGATCTT 


240 


TTATGCTGTA 


AATTCAATTC 


ACCATGATGT 


TTTTATCTGA 


GTGCATTCTT 


TTTGTTGGTG 


300 


TTTTATTCTA 


GTTTGATTTT 


GTTTTGTGGG 


TTAAAAGATC 


GTTTAAATCA 


ATATTTACAA 


360 


CATAAAAMMC 


TAAATTTAAC 


TTATTGCGTG 


AAGAGTATTT 


CCGGGCCGGA 


AGCATATATC 


420 


CAGGGGCCCG 


ACAGAAGGGG 


GAAACATGGC 


GCATCATGAA 


GTCATCAGTC 


GGTCAGGAAA 


480 


TGCGTTTTTG 


CTGAATATAC 


GCGAGAGCGT 


ACTGTTGCCC 


GGCTCTATGT 


CTGAAATGCA 


540 


TTTTTTTTTA 


CTGATAGGTA 


TTTCTTCTAT 


TCACAGTGAC 


AGGGTCATTC 


TGGCTATGAA 


600 


GGACTATCTG 


GTAGGTGGGC 


ATCCCGTAAG 


GAGGTCTGCG 


AGAAATACCA 


GATGAATAAT 


660 


GGGTATTTCA 


GTACAACACT 


GGGGAGACTT 


ATACGGCTGA 


ATGCTCTTGC 


AGCAAGGCTT 


720 


GCACCTTATT 


ATACAGATGA 


GTCGTCGGCA 


TTTGACTAAA 


TTATGGCATT 


CCGGAGTTTC 


780 


TGGAAGATAA 


AAAAAGAAGC 


CCTTATCAGA 


AAGCAGACAG 


GTTATATCAG 


TATTCTGTCG 


840 


AT7\AT\TT\7'; fT 

H 1 A/irt. 1 HAL L 




AT APn AH A AT 


ATTATTTGTA 


TTGATCTGGT 


TATTAAAGGT 


900 


AATCGGGTCA 


TTTTAAATTG 


CCAGATATCT 


CTGGTGTGTT 


CAGTAATGAA 


AAAGAGGTTG 


960 


TTATTTATGA 


TTAAGTCGGT 


TATTGCCGGT 


GCGGTRCTAT 


GGCAGTGGTG 


TCTTTTGGTG 


1020 


TAAATGCTGC 


TCCAACTATT 


CCACAGGGGC 


AGGGTAAAGT 


AACTTTTAAC 


GGAACTGTTG 


1080 


TTGATGCTCC 


ATGCAGCATT 


TCTCAGAAAT 


CAGCTGATCA 


GTCTATTGAT 


TTTGGACAGC 


1140 


TTTCAAAAAG 


CTTCCTTGAG 


GCAGGAGGTG 


TAT CC AAACC 


AATGGACTTA 


GATATTGAAT 


1200 


TGGTTAATTG 


TGATATTACT 


GCCTTTAAAG 


GTGGTAATGG 


CGCCAAAAAA 


GGGACTGTTA 


1260 


AGCTGGCTTT 


TACTGGCCCG 


ATAGTTAATG 


GACATTCTGA 


TGAGCTAGAT 


ACAAATGGTG 


1320 


GTACGGGCAC 


AGCTATCGTA 


GTTCAGGGGG 


CAGGTAAAAA 


CGTTGTCTTC 


GATGGCTCCG 


1380 


AAGTGATGCT 


AATACCCTGA 


AAGATGGTGA 


AAACGTGCTG 


CATTATACTG 


CTGTTGTTAA 


1440 


GAAGTCGTCA 


GCCGTTGGTG 


CCGCTGTTAC 


TGAAGGTGCC 


TTCTCAGCAG 


TTGCGAATTT 


1500- 


CAACCTGACT 


TATCAGTAAT 


ACTGATAATC 


CGGTCGGTAA 


ACAGCGGAAA 


TATTCCGCTG 


1560 
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TTTATTTCTC AGGGTATTTA TCATGAGACT GCGATTCTCT GT77CACTTT TCT 1613 
(2) INFORMATION FOR SEQ ID NO: 112: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 930 base pairs 

(B) TYPE: nucleic acid 

(C) 5TRANDEDNESS : double 

( D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 

NTAGTCCATG GCCCCATGGA GCGAANTCCA AAGTGTGGAT ATTGTCGTTT TAATTCATCC 60 

CAAAAGCTGA AATACGCCAA AACCCACGTT CCCTAACATT GGTATCATGC ATAATGACCA 120 

CAGCCNTTCA GAAAGCTTTG GCAACCAGCT TTCAAAATCA TGGGTACCGC TTCAAACGTA 180 

TGCAAACCAT CAATATGAAG CAGATCAATG CTACCTTGTG AAAAATGCTC TAACGCTTGG 24 0 

TCAAATGTAC TGCGAATGAG AGTAGAAAAA CCTGAATAGT GCTGTTGATT ATATTCTGAT 300 

ACTTGCCTGT AAACTTCTTC GCCATACAGC CCCGCATGTT CATCTCCCCC CCAGGTATCA 360 

ACGGCAAAGC AGCATGTTTC TAAATCTAGT TTAGAGACTG CTTGGCAAAA TGAGAAATAA 4 20 

GAACTTCCAT AATGAGTTCC CAGCTCAACA ATATTTCTTG GCCGCAGTGT GTCAACTAAC 4 80 

CAGAAAGCAA AAGGAATGTG TTCTAGCCAA GCAGATTGTG CAAGGTATGT AGGACACCAN 54 0 

AAAAGAGATG GTTTGAAAAT GAAATTCAAT TCCCTGCCAA TATCAGTGAT GGGATATAAC 600 

TCACGATTCT CTACTAACTG ACTAATTTTT TGACTATCCA TTGAGGAAAA CTCACATGTA 660 

TTTATAGAAT TAAATCAAGA AACCTGAAAA TACCTATAGT GCGGTAACTT ATTAACTAAC 7 20 

ATTTAAATAT TAACAATACA CTTGGAAATA TTAGTTAAAA ATAAATCATT ATGATTTCTC 7 80 

ATCAATCCTG GTGCTCACGC AAAGTTGCCA GCCCCATAAT AATAAGACCA TAGAACAAGC 840 

AAAGTAATAC ACCCACAGTC GCAAGATTAT AGAATCGCCG TGGATATTCG GCATCTTCCG 900 

CTAAAGTTGG TTGGGTAATA ACCAATAGAT 930 



(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 659 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 

ACGATATCCC CCCTCTGCTT TTGAGAGGCA ATCTGCTTTA ATACATGATT CATCACAACA 60 

CCTCTTGCTG CGCTTTGATC TTAATTTTAT ATTTTTGGGT AGGGAAAAGT AATTGCCCCT 120 

GATACGGCTC ACCATTTACC AACGTTTCAC AGCTATGTTC CAGAGCTAAA TTAAGACCTG 180 
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GTAGAATATC CCAGCAATTC ACCCCTTTGA CATTTTCAAA GCTGTCATAA GCACCGGNNA 24 0 

AGGGGGGGCC AACATGTTAT ACATGGAGCA GCCAATGATA CGATATTCAA AGCCCTCTTC 300 

CAGTTGCATC AGATCCTGCT TGGTAASGGA GGAAGAGAGG CGACGAATAC GAGAGCGATG 360 

ATGTGTAATC GGCATACCTG TGATATGAAG ATCATTCAAT TCAGGTAAGA AGATGCAGGA 4 20 

CTCTTGATGT TTCCCCTCGG TGTAAATGCT GATACCAATG CCCCACTCTT TGAGCCCAGA 4 80 

GACAAAGTTT TCTGTGCCAT CAATTGGATC TAGAACAATG TAAGAACCTT TGGGATTCCA 54 0 

CTCAATATCT CCTAAAGGGG CTAATTCCTC TGAAATTAGC ACATGCCCTG GTAGATGCTT 600 

TCTACAGAGT TCGAAAACTA TATCTTGAAC TTTTAGATCC AGTACTGCGG CCGCGATCC 65 9 
(2) INFORMATION FOR SEQ ID NO: 114: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 556 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 

CCCGGATATA CATCAGGAGA AATTGGAGCA GCAATTGGAT GCGCCATTAA TGCCTGGTTA 60 

GGGATCCCCG CATGTGGGCA CGCAAATGGC TCAGAATATG ATCGACCTTC ACCAGATAAA 120 

CCAAATCTGA GCGAACCATT TATCCCAAGA CCCACGTATG ACGCTTCACT TCATTCCTGG 180 

CATGGCGGAT ACTGAGTAAA TCATCCTGAA TCATTATGTT CAACATCATC AATTCTCCGG 240 

ACTTGTTGTC AGATGTCCGG AGAATATTAA CCTTTTCTTC AGAAACAGAW TGATCAAGAA 300 

TCACACTCCT TCTTTAAGAG GATTTTATCC AGAAAACTGA CTTTCTTCTA TCAAAATMAC 3 60 

AGTATCCTGT TTTATCAGGA ATAATCTTTA CCTCCGGTAT CATTCCCATA ATCAGATATC 4 20 

AGAAAAATGT GCCAGTAATT TTTTACTGAT GACTTCAAAC ATTTCACATT CATCACACGT 4 80 

CAGATTACTC CAAAGTTCTT TCAGATATGT GTTCTGCGCC AGAGTGAGTC TCTGAATAAA 54 0 

AAACATACCT TCAGAC 55 6 
(2) INFORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 503 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 
TACCTGTTTG TGGAATTTGA CCCAGAAGTG ATTCATACCA CGACTATCAA CGCGACCCGN 60* 
GTGTNCAGCC ACTTCGTGCG CTTTGGCGTN CGCAGCGATA GTCCCATCGG CGGTTATTCA 120 
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TCAGCTATCG G TAT AT AAA C C G AAA G AC A T TGTCGATTCC GGCAACCCCT TATCCGGGTG 



180 



ATAAGGTGAT TATTACCGAA GCGCGTTCGA AGGCTTTCAG GCCATTTTCA CCGAACCCGA 



240 



TGGTGAGGCT CGCTCCATGC TATTGCTTAA TCTTATTAAT AAAGAGATTA AGCACAGTGT 



300 



GAAGAATACC GAGTTCCGCA AACTCTAAAA CGCAATCCCA AACAGTGTTT TGACATTAGC 



360 



ATCCGTGGTG GCAGCCAGCC ATGCGGCATC TTCTCCACGC CAGTGCGCAA TACGTTGCAA 



420 



AATATGGGGC AGATGGGCTG GCTCGTTGCG CCGGGATGAN GGCTTTGGCG TGAGATCGCG 



480 



AGGGAGCAGA TACGGNGCAT GAG 



503 



(2) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 433 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 

TTTAACATCA AAATTACCTG CAGCTGAAAT GATTTTGCTG ATTTCATTAA TTAATGGATT 60 

AAGATTACCC TGACTTCCAT AGGCTAATGC ATCATTCCCA TACACATAAC TTGCCTTATT 120 

ATTACTCTGT TGATACTNAA GTGCCTTTTT AAGGGAATCT GGTGTGATTA CCCTGCCGTC 180 

TTTATCAAAA ATCTGCTCTA TCTGGTGATT AGAGATATCA CCTGACTCTT TTTCAAACCA 240 

GTTTTTAAAT GTAATACCAT TTTTGTGGCC AATGGAAAGA ACATTACCTT CAGCTTTATA 300 

CATGATGAGG TCATTACCTT CTCGCCTGAA GGCCACATCC CGGAAATCAA TATCAGCCAA 3 60 

ACTGAGTTTA TCGTCTTTCC CCCCATCATC GTCAATAATA TGATGGCCAT ATCCTGAAAG 4 20 

ATAACGATAA ATA 4 33 
(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 302 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 
GCGCTCTGTT CCCGTTCCTG TTCATCACCA TCGCCTGTGG TGCGGTATCT GGCTTCCACG 60 

CGCTGATCTC TTCCGGTACG ACGCCAAAAC TGCTGGCTAA TGAAACCGAC GCGCGTTTCA 120 

TCGGCTACGG CGCAATGCTG ATGGAGTCCT TCGTGGCGAT TATGGCGCTG GTTGCTGCGT 180 

CCATCATCGA ACCGGGTCTT TACTTCGCGA TGAACACCCC GCCTGCTGGC CTTGGCATCA 240 - 

CCATGCCTAA CCTGCATGAA ATGGGGTGGC GAGAACGCGN CGGATTCATC ATGGCGCANT 300 
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GA 



302 



(2) INFORMATION FOR SEQ I D NO : 118: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 656 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 
AATTAATAAG CCAAATACTA CATCACGTAA TACTTGCAAA GAAGTGCGTG GAGTTTGACT 60 

AATAATGGGT TTGTCCATTA ATACTTACCC AAATAATCGG CTCATTATAG CAACGAGCCT 12 0 

CCGATTAAAA TTTAAAATAC TCAATCATTT AATAGCAACG TTAGCAGCTA CAGCGATTTG 180 

ATAAATAATT TGTGTGATAT CTTTAAATGA TTGCATGGTT TTGCTATCAA CCTGAGGTAG 24 0 

AACCAATATC TGATCCCCCG GTTGTACTTT ACCTTGCCCT TTAAATTCTA CAAGACCATT 300 

TGCATGTACA ATAGCAATTC GCTTGTCGTT AGCTCGCTCA GTAAAACCTC CGGCCCATGC 3 60 

AACATAATCA TCCAAATTAG CATCGGCATT ATATACTACT GCTTGTGGCA TCAACACTTC 4 20 

ACCCCCCACT TGAATAAGAT CAGTCTTATT TGGAATAACT ATTTGATCGC CTTGTTCTAA 4 80 

TTGGATAWTG GCAATAACAC CTTTATCTGC AACTACTACT TTACCAAGCG GTKGAACTTT 54 0 

ACGAGCCTTT YCAACAAACT GCATCACTAA CTCTGCTTCT TTAGCACGTA TATTCGCCTC 600 

ACCATCAGAT CGCGCGGGTG TGGTAAANTT CATACGTTCC AAGCGGTTTA GAGATT 656 
(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 436 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 



ATATGTTATC 


TGGATCCAGA 


TAAAGAGCGT 


TCTTGACCCG 


CTATATCCAG 


ACAGGTCAGT 


60 


TACACCCTGT 


CCGGAAAAAC 


TGATCGGAAT 


AACAACAGTA 


TATTTTCTAA 


TACACTGGCA 


120 


AATGGTGCCG 


GCGGTGTGGG 


GATTCAGCTT 


CTGGATAGCG 


CTGGTAATGC 


GGTTGCTGCT 


180 


GGACAGAAGA 


AATATCTGGG 


ACAGGTAGGA 


CCATCAACAT 


CTCTCAATAT 


TGGATTAAGG 


240 


GCATCTTATG 


CACTGACCAA 


TGGACAGACT 


CCACCTACTC 


CCGGACGAGT 


TCAGGCGTTA 


300 


GTTGATGTTA 


CCTTCGAGTA 


TAATTAGGAA 


TGTCGGGGAT 


GGGCTATCCC 


CGATATTATT 


360 


GCAGGATTAG 


TCTGTGATAC 


AGATATACAG 


CCCATATGAA 


CAACTGTTTG 


CATATATAAA 


420 - 


AATGATGATA 


ATTTTA 










436 
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i2) INFORMATION FOR SEQ I Q \;o : 120: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 559 base pairs 

( B ) TYPE: nucleic acid 

(C) STRANDEDNES5 : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 

AATAATTAAA TTTGGAGGGA TCAGTTTTCT GATAATGTTC TGTTATTAAA ACATTATCCC 60 

ATGGGGCGTA GTTATATCAA TTAGCAGGAT CTTATGAGTT AACTAACATC AGTTTTGAAT 120 

TTTTAATGGG GGTAATTTAT CTTTTACTAA AAATATTTTA ACTATTAATA TAGCATCATG 180 

GTTGTTACGG TTTGTTTTAA TTCTATTTTA TAATGTGCTA TATATTGTAT TTTTGTGCTT 24 0 

AGATAAATAT GTTTTTTCAT TACTTTAGTG ATGTTAATAT TTTGCGTGTA GTAAAAATCA 300 

TTGTTATAAC AAATGTCACT GTTGCTATAC TTTGCTGAAC TGTTTATCGG TCATTTTGAT 3 60 

TCAATCACTG GTTCTATATT TTTTAATAAC CGTTCTGTAG CGATTAATAT ATTGCTCTCC 4 20 

AGAGGATACA CTATATGAAA TATATTAAAA GTCATTAATT TTNATTCAAT GTTGTTTAGA 4 80 

GTTATGTTCA GTGTTTGGNA ATAGGATGTG TTTCTAAACC GTCTTGGGTT CTATAATAAA 54 0 



TTCTATTCTT ANAGGTTTT 

(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 481 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



559 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 

CATGTCCCTT CCTGAATACT GGGGAGAAGA GCACGTATGG TGGGACGGCA GGGCTGCTTT 60 

TCATGGTGAG GTTGTCAGAC CTGCCTGTAC TCTGGCGATG GAAGACGCCT GGCAGATTAT 12 0 

TGATATGGGG GAAACCCCGG TACGGATTTA CAGAATGGTT TCTCCGGACC TGAAAGAAAA 18 0 

TTCAGCCTCC GGCTCAGGAA TTGTGAATTT AACAGTCAGG GTGGGAACCT TTTCTCTGAT 24 0 

TCCCGGATAA GGGTGACTTT CGATGGCGTC CGGGGTGAAA CGCCGGATAA GTTTAATTTA 300 

TCCGGTCAGG CAAAAGGCAT TAATCTGCAG ATAGCTGATG TCAGGGGAAA TATTGCCCGG 360 

GCAGGAAAAG TAATGCCTGC AATACCATTG ACGGGTAATG AAGAAGCGCT GGATTACACC 420 

CTCAGAATTG TGAGAACGGA AAAAAACTTG AAGCCGGAAA TTATTTTGCT GTCTGGGATT 480 

A 481 
(2) INFORMATION FOR SEQ ID NO: 122: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 535 base pairs 

( B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 



CCATATAGTG 


ACTTCATTGA 


ACAAAATGTA 


AATGGAATCT 


TGCTGGAGAA 


TGACCCACAT 


60 


ATATGGATAA 


AAGCTCTTTC 


ATTACTTGTT 


AGTGCAGATC 


ATAAACGTAG 


CGAGTTGGCG 


120 


TTCAATGCTA 


AAAAATATGC 


TTGTAAAATT 


GTAGGTGTCG 


AGTAAAAAGA 


TATTTTTATT 


180 


TAATTGGTGC 


TATTGAATGT 


TTAAAAATCG 


AACTGATTGG 


TGTTTTAATA 


TTAATCATAG 


240 


GTTATGATGC 


AAAAATATAT 


TAGGCATTGC 


CTGCTTCAAT 


TAACTTGAGA 


GTGTAAGTTG 


300 


AATTGAAATA 


TGGTTATATG 


ATAAAGCAAT 


ATATGTTAAT 


ACATATGTCA 


ACCGAAAATG 


360 


CCATTATGTG 


TTTTTTACTT 


TATCTGTAAC 


GACACAATAT 


ATAAAATAAG 


GCTAATAATC 


420 


AAAACGCTTT 


TTAATTTGAT 


TGTTTTGAAT 


CAAGTGACTA 


AGAAATTCTC 


TTGCTGCAAA 


480 


TAACTCCCTT 


AGTGATTTTT 


TTTGAGTCTA 


TTTTATTCTC 


TGGGCATGGT 


CATGC 


535 



(2) INFORMATION FOR SEQ ID NO: 123: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 412 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
{ D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 

CCGGCCCCAT AATGATGGTT TTATTAAGGT TAGCGCCGAC GGTTTCGATG AACGATTTCA 60 

GGTCGGTATC TTTAAAATTA GCGGTGAAAG TGGCTTCTTC CGCCCAGACC GGTGAACTGC 120 

ATAATGCCGC TGCCAGCACC AGCGGCAGTA AACGCTTTTT TGTTTTGAGG CCAGTTGTCT 18 0 

TCTTACGCCA GACCGACAAC GTCATATCAC GCCAAAACAC GATGAATGAT TCTCCTGGAT 24 0 

TAAATGCGGT TAGCGCAGCG CGATGGAAAT GTCGTGGCGC GCACCCTTGC GTAAAACCGT 300 

AAGTTGAATG GAATCCATTG AAGGTAACTG CCGCATCAGA GCAATCATTG CTCGTGGATC 3 60 

AGTGAAATCC TGCTGATTTA GCGCAAATGC GATATCGCCT TCCTTAAAAC CG 412 
(2) INFORMATION FOR SEQ ID NO: 124: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(>::) SEQUENCE DESCRIPTION: SEQ ID NO: 124: 

TAGCCTGTTC AGCGTATATT TGGGATGAGA AGCCAAAGTG GCTTTGGTGG TGTCCCAGCC 60 

CAGGTTTTTA TTACTGCTGG TTATTTACCT TTCATGTTTT TCAATAAAGT TGTGACTCAG 120 

TTGAAATCTG CTGTCAATGC TAATATGGGA CTTTTTTGTT ATAGACAAGT GACTCCTTTT 180 

GCAACTTTTA TAGCACGTTT TATGCTAGAA ACAATGGTGG GCATGATTGT CGGTATAATC 24 0 

CTAGTACTAG GATTATTGTG GTTTGGCTTT GATGCAATAC CTGCGGATCC ATTGCAAGTG 30 0 

ATCCTTGGTT ATTCTCTTCT GATGCTGTTT TCTTTTTCTC TTGGTATTGT ATTTTGTGTT 360 

ATTTGTAATT KRGCGARAGA GGCAGATAAA TTTCTTAGCT TGTTAATGAT GCCTTTGATG 4 20 

TTTATCTCTT GTGTTATGTT TCCTCTTGCT ACTATTCCCC CTCAATATCA GCATTGGGTT 480 

TTTATGGAAT CCACTTGTGC ATGCTGTAGA ACTAATCCGA AGGGCATGGG ATATCTGGGT 54 0 

TATCGTAGTC CTGATGTAAG TTGGGCGTAT CTGTCG 57 6 



(2) INFORMATION FOR SEQ ID NO: 125: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 132 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125: 
TTACCAAGCA GGATCTGATG CAACTGGAAG AAGGCTTTGA ATATCGTATC ATTGGCTGCT 60 
CCATGTATAA CATGTTGGCC GCCGTACGCG GTGCCTATGA CAGCTTTGAA AATGTCAAAG 120 
GGGTGAATTG CT 132 
(2) INFORMATION FOR SEQ ID NO: 126: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 542 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126: 

GATTAGGGGT CACTCAGGAT TATAAAAAAG CGGCAGAATA CTATAAAAAA GGTGATAAAA 60 

ATAATGATAT TACAGCACAA TACCGTCTGG CAAAACTTTA TGAACAAGGT AACGGTGTAA 120 

AACGTGATTA TCAACAAGCG ATAAACCTTT ACCTTAAACA TATCAACAGA ATGGATCACA 18 0 

TCACTGCCCC CAGTTTTGTG GCTCTGGGTG ATATCTATTC TCTGGGATTS GGGGTAGAGA 24 0 

AAAACCCACA ACTGGCTGAA AAATGGTATC AAAAAGCGAT AGATGCAGCT AATACACAAC 300 * 

ATAACCAGGA AATAAATCAT TAAACGACAA CACTTAATAC CATATTGTGA AGATGTTCAG 360 
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ACATGGCGGA ATTCCCCTAT TCTTTGTTGG CGCTTACAAC AGACTATATT CCGCCATATC 4 20 

TGTCTTTATT GTGTATAAAC CATCGATACT GATGTTTGAT AGTGCTAAAT AATCATTGGC 4 80 

GCAATCACAA AGCCTAATGC CACTCCAGCA ATAATTCCCC CCAACCCAGG CAGCATAAAT 54 0 



(2) INFORMATION FOR SEQ ID NO: 127: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 382 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 



GAACCACTTA 


GCGGCAGCTA 


TCGGGAATCG 


CCTGCTGAAA 


GACGGTCAGA 


CAGTGATTGT 


60 


GGTTACCGTG 


GCTGATGTTA 


TGAGTGCCCT 


GCACGCCAGC 


TATGACGATG 


GGCAGTCAGG 


120 


CGAAAAATTT 


TTGCGGGAAC 


TGTGCGAAGT 


GGATCTGCTG 


GTTCTTGATG 


AAATT GGCAT 


180 


TCAGCGCGAG 


ACGAAAAACG 


AAGCAGGTGG 


TACTGCACCA 


GATTGTTGAT 


CGCCGGACAG 


240 


CGTCGATGCG 


CACGTGGGGA 


TRCTGACAAA 


CCTGAACTAT 


GAGGCCATGA 


AAACATTGCT 


300 


CGGCGARCGG 


ATTATGGATC 


RCATGACCAT 


GAACGGCGGG 


CGATGGGTGA 


ATTTTAACTG 


360 


GGAGACTGGC 


GTCCGAATGT 


CG 








382 



(2) INFORMATION FOR SEQ ID NO: 128: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 126 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: 
CGTCCCGCAC CCGGAAATGG TCAGCGAACC AATCAGCAGG GTCATCGCTA GAAATCATCC 
TTAGCGAAAG CTAAGGATTT TTTTTATCTG AATTCTAGCC AGATCCCCGC TGATTTATGC 
TGGTTA 



(2) INFORMATION FOR SEQ ID NO: 12 9: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 258 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 129: 
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ACCCCCAGCC TAGCTGGGGG TTT7CTGTGC A C AAAAAA T C CCGGCATAAT GGCCGGGATT 60 

TGCGAGCTTT CCCACTATTT CTTGATTCCT AAACGGAACA TATCAGTTGG GAATAAAGGT 120 

TGTATTATCA CTT CATC ATT ANAAATGAAT AATTTGGGCG ATAAAGCTGT TACGTCATAG 180 

ATATTTTCAG CGATTAATCT TAGANTTGAC CTAAAAACTG GAATACTTGC ATCATCTGCA 24 0 

AAGACAAACA TGTCATCG 2 58 

(2) INFORMATION FOR SEQ ID NO: 130: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 399 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130: 

AACCAGCGGT TCGCATCATC TCATCCCACT GACTCTCCGC TTTTGACAGA TCTGCATATC 60 

CTCGGGCCAA CTTATCCAGT ACTCCGTAGT TTGCCGATTT ATTCACCCGC CAGAACACCG 120 

CCTCACCTGC ATCGGCAAGC CGGGGGGAAA ACTGATACCC CAGTAGCCAG AACAGACCGA 180 

AAATAATATC GCTGCTACCC GCAGTGTCTG TCATGATTTC AACTGGATTC AGCCCTGTCT 24 0 

GCTGCTCAAG AAGTCCTTCC AGTACAAAAA TCGAATCCCG TAATGTACCG GGTACCACAA 300 

TGCCATGGAA CCCAGAGTAC TGATCAGATA CGAATTATAC CAGGTGATGC CTCGTCCAGA 360 

ACCAAAATAT TTTCTGTTAG ATCCTGAGTT GATGGTCTT 399 
(2) INFORMATION FOR SEQ ID NO: 131: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 745 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 

AAATAACATC AACATACATT TGACTCGCGG GGGAAACGTT TACGGAGTCT TCATACTGGC 60 

ACTTTTTTAT GCTGCTGACT ACTCTTCGTC ATCGCCATCA ACATGCGCAC GAATCAGCGC 120 

CATAAACGGT TTGCCAAAGC GTTCCAGCTT GCGCATCCCA ACGCCGTTAA CGCTGAGCAT 180 

TTCGCTGGCG GTGATCGGCA TCTGTTCAGC CATCTCAATC AAGGTTGCGT CGTTAAACAC 24 0 

CACGTACGGC GGGACATTAC TTTCATCGGC TATCGATTTA CGCAGTTTGC GTAATTNGGC 300 

GAACAGTTTG CGATCATAGT TGNCGCCGAN CGATNTCTGC ATCGCTTTCG GTTTGAGCGC 360 

CACGATACGC GGCACGGCAA TTGCAAAGAG GATTCGCCGC GCAGCACCGG GCGCGCGGCC 4 20 

TCTGTCAGTT GTAGGGCAGA ATGCTGGGCA ATATTTTGCG TCACCAGGCC GAGGTGAATC 4 80 
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AGCTGGCGGA TCACGCTCAC CCAATGTTCA TGGCTTTTAT CACGGCCCAT GCCATAGACT 54 0 

TTCAGTTTGT CATGACCATA GTCGCGGATA CGCTGGTTAT TAGCACCACG AATCACTTCC 600 
ACCACATAAC CCATCCCAAA CCGCTGATTC ACACGACCAA TGGTGGAAAG GGCAATCTGA 660 
GCATCGGTTG AACCGTCGTA CTGTTTCGGC GGATCGAGGC AG AT AT CGC A GTTCNCCGCA 720 
CGGCTCCTGA CGCCCTTCGC CAAAA 7 4 5 

(2) INFORMATION FOR SEQ ID NO: 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 439 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: 
AGAATGGCGG CTTCTTGCCC CCCTTTGCCC CGGTCCTGAC TAGCATGGCT GGAGTCCAGT 60 

GTCCAGGCCA CGACCATGCT CATCATGGAA GCAGCTTTTG TAGTACANTC GCAGCTTATT 120 

TTCCTGGAAC GAAATGTCTG GCATCGTGGT GCATAACATA ACCCCCAATG CCCAGCAGAT 180 

GCACAGAAGG TTCTAGAATC GCCCACTGAT ATCCCATACA AAATTTACCA AAACGTGTTC 24 0 

GTATTTCTCG TATAAATAAT GTCTCTATGG TGACGTTCTA GACTTCAAAC CCACTTTTTG 300 

AATTTGATGA TGTGCTCCTA ATCTCTTCAG GAATGTAACG CCCTTGGTTT ACAGCTACCA 3 60 

ATACACTGGA GGTATACTTA TCTGCAACTG GATGAACTAG ATGTACTTGA GCAAACATTT 4 20 

CATAAGCTCG ACGACAGTT 439 
(2) INFORMATION FOR SEQ ID NO: 133: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 350 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133: 

CTGGAAAGCG ACGTTGATGG ATTAATGCAG TCGGTAAAAC TGAACGCTGC TCAGGCAAGG 60 

CAGCAACTTC CTGATGACGC GACGCTGCGC CACCAANTCA TGGAACGTTT GATCATGGAT 12 0 

CAAMTCATCC TGCAGATGGG GCAGAAAATG GGAGTGAAAA TCTCCGATGA GCAGCTGGAT 180 

CAGGCGATTG CTAACATCGC GAAACAGNAC AACATGACGC TGGATCAGAT GCGCACCGTC 24 0 

TGGCTTACGA TGGACTGAAC TACAACACCT ATCGTAACCA GATCCGCAAA GAGATGATTA 300 

TCTCTGAAGT GCGTAACAAC GAGGTGCGTC GTCGNATCAC CATCCTGCCG 350 
(2) INFORMATION FOR SEQ ID NO: 134: 
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(i) SEQUENCE: CHARACTERISTICS: 

(A) LENGTH: 40C case pairs 

(B) TYPE: nuclei: acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134: 

CCCCAAGATT GCTAACAAAT GCGCGTTGTT CATGCCGGAT GCGGCGTGAC CGCCTTATCC 60 

GGCCTACGAA ACCGCAAGAA TTCAATATAT TGCAGGAGCG GTGTAGGCCT GATAAGCGTA 120 

GCGAWTCAGG CAGTTTTGCG TTTGCCCGCA ACCTTAGGGG ACATTTAGCG ACCCCATTTA 18 0 

TTTCTCACTT TTCCGCCTCA TCATCGCGCG TTAATTTCTT TCATGAATCA CGCTTTACAA 24 0 

TATCCAGCGC GCGCANAACG GTACTGGCAG GGATCTGAAT TTTCCTCCAG CAGCACAATC 300 

AAATCGACAG CCAGTTTGAC ATCGTCAAGG GGCATTTTCC CAGTGACATA ATCTCTCCAT 360 

TGCTAAGCGG GTTAAAACGC GCTAACCTGT TTCGATTTTT 4 00 



(2) INFORMATION FOR SEQ ID NO: 135: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 63 base pairs 
{ B > TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135: 

CTATCCTTAT GACCACCCAA CTACNTCATT TACACCCAAA CCAGCGATCT GAATAAAGAA 60 

GCGATTGCCC AGTTACGACT GGGCGGAAAA TGCGCGTAAG GATGAAGTAA AGTTTCAGTT 120 

GAGCCTGGCA TTTCCCTGTG GCGTGGGATT TTAGGCCCGA ACTCGGTGTT GGGTGCGTCT 180 

TATACGCAAA AATCCTGGTG GCAACTGTCC AATAGCGAAG AGTCTTCACC GTTTCGTGAA 24 0 

ACCAACTACG AACCGCAATT GTTCCTCGGT TTTGCCACCG ATTACCGTTT TGCAGGTTGG 300 

ACTGCGCGAT GTGGAGATGG GGTATAACCA CGACTCTAAA CGGGCGTTCC GACCCGACCT 360 

CCCGCAGCTG GAACCGCCTT TATACTCGCC TGATGGCAGA AAACGGTAAC TGGCTGGTAG 42 0 

AAGTGAAGCC GNGGTATGTG GTGGGTAATA CTGACGATAA CCC 4 63 



(2) INFORMATION FOR SEQ ID NO: 136: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 584 case pairs 

(B) TYPE: nuclei- acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136: 
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TTGGTCAGCC GTACCTGAAT GGGGGCTGAT GCCCGGCTGG TTAATGGCAG GTGGTCTGAT 60 

CGCCTGGTTT GTCGGTTGGC GCAAAACACG CTGATTTTTT CATCGCTCAA GGCGGGCCGT 120 

GTAACGTATA ATGCGGCTTT GTTTAATCAT CATCTACCAC AGAGGAACAT GTATGGGTGG 180 

TATCAGTATT TGGCAGTTAT TGATTATTGC CGTCATCGTT GTACTGCTTT TTGGCACCAA 24 0 

AAAGCTCGGC TCCATCGGTT CCGATCTTGG TGCGTCGATC AAAGGCTTTA AAAAAGCAAT 3 00 

GAGCGATGAT GAACCAAAGC AGGATAAAAC CAGTCAGGAT GCTGATTTTA CTGCGAAAAC 3 60 

TATCGCCGAT AAGCAGGCGG ATACGAATCA GGAACAGGCT AAAACAGAAG ACGCGAAGCC 4 20 

TACGNTAAAG AGCAGGTGTA ATCCGTGTTT GATATCGGTT TTAGCGNACT GCTATTGGTG 4 80 

TTCATCATCG GCCTCGTCGT TCTGGGGGCG CAACGACTGC CTGTGGCGGT AAAAACGGTA 54 0 

GCGGGCTGGA TTCGCGCGTT GCGTTCACTG GCGACAACGG TGCA 58 4 



(2) INFORMATION FOR SEQ ID NO: 137: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 527 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 137: 

GCAGGCAGGA GGAACTGCCC AGTGATACGG TTATTCGTGA TGGCGGAGGG CAGAGCCTTA 60 

ACGGACTGGC GTTGAACACC ACGCTGGATA ACAGAGTTGA GCATTGGNTA CACGGGGGAG 120 

GGAAAGCAGA CGTTACAATT ATTAACCAGG ATGTTTACCC AGACCATAAA ACATGGCGGA 180 

TTGGCAACCG NAACCATCGT CAACACCGTT GCAGAAGKTG GTCCGGAGTC TGAAAATGTG 24 0 

TCCAGCGGTC AGATGGTCGG AGGGACGGCT GAATCCACCA CCATCAACAA AAATGGCCGG 300 

CAGTTATCTG GTCTTCGGGG ATGGCACGGG ACACCCTCAT TTGCGCTGGT GGTGACCAGA 360 

CGGTACACGG AGAGGCACAT AACACCCGAC TGGAGGGAGG TTAACCAGTA TGTACACAAC 4 20 

GGTGGCACGG CAACAGAGAC GCTGATAAAC CGTGATGGCT GGCAGGTGAT TAAGGAAGGA 4 80 

GGGAACTGCC GGCGCATTAC CACCATCAAN CCNGAAAAGG GAAANCT 52 7 



(2) INFORMATION FOR SEQ ID NO: 138: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 441 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138: 
GTCAGTCTCT GGGGGAAGTG CGTGTTCCGA CCGGGGAAAT GTGGTGGAGA AAGTTATTGA 60 
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AGGGGCTTAC GAGGTGG7GG GGG77TTTGA C CG GAT T GAG G AAAAGC G T G ATGCCATGCA 120 

GTCGCTGATT CTGCCGCGAC CGG-ZGCCAG GCGCTGGCAC AGGCGGCACT GACTTACCGT 180 

TATGGTGACG AACMTCARCC CGTCACCACG GCCGACATTC TGACACCACG ACGCCGGGAR 24 0 

GATTACGGTA AGGACCTGTG GAGTGCTTAT CAGACCATTC AGGAGAATAT GCTGAAAGGC 300 

GGAATTTCCG GTCGCAGTGC CAGAGGAAAA CGTATCCATA CCCGTGCCAT TCACAGGATC 3 60 

GACACCGACA TTAAGGTCAA CCGCGCATTG TGGGTGATGG CTGAAACGCT GCTGGAGAGT 4 20 

ATGCGCTGAT GCCGTTTCCN T 441 



(2) INFORMATION FOR SEQ ID NO: 139: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 398 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139: 

CGAGCGAGAT GAACTTCGAG GGCGGTGTGA GCCAGTCGGC TTACGAGACA CTGGCGGCGC 60 

TTAATCTGCC GAAACCGCAG CAAGGGCCGG AAACCATTAA TCAGGTTACC GAGCATAAGA 12 0 

TGTCAGCTGA GTAAGCCTGT ATGCCGGATA AGGCGCTCGC GCCNATTCCG ATGAAATAAG 180 

GCGCATCGGG CCTGAAGGAA AGCCGTATGN ATACACCCGC AGCCCGCATC CGGCAAGTTA 24 0 

CAACAAATAA CCTTTAACCA TGCTTTTTGA TGTTTTTCAG CAATACCCCG CGGCGATGCC 300 

CATACTGGCA ACCGTCGGGA GGGATTGATC ATCGGCAGTT TTTTGAATGT GGTGATTTGG 3 60 

GCGTTACCCC ATCATGCTGC GCCAACAAAT GGCGGAGT 3 98 
(2) INFORMATION FOR SEQ ID NO: 140: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 580 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: 

GCCGAACAGA CACAGCAATA TGAACCCTGC CAGCGCAGAC GCTTGCTGAT TAATGCTCTG 60 

AACAAAAGGC GAAGAATGGC AAATCCTGCG ATCAGCAAAG TCAGCGCACC GACTATCTGT 120 

AACATAGTCA CTCCGTGATG AATATCATGT GTATTGTGAA TGCCAGTGAA TGTGGCACTG 180 

AAGCGTTTGC ACCTGTCCGG GTCCCGGTCA TGATGACCGS AACAGAGAGA CAATGCCGAA 24 0 

TTATCAGAAG GTCACATTCA GTGTGGCTTG GCCGTTATAA CCTTCAGCGC TGCTGCCGCT 300 

GACGCTGTGG GCATAACCGG CCTGAACGCC CAGGGTGATA TTTTCCCGGA CACGGGCTTC 3 60 
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CAGTCCGGCC TGCAGCTCCA GTGACGTGCC ATTCCGGGAC GGTGAGAACG TCATGTTACT 
GCCGGCTGCG GCTGTACCCA TGCTCATGTC TCCCCGGGAG CTGAAGGTGC GGATAACAGA 
AGGCTGTACC CACCCGTTCA CCGGCAGTTC ACGGACACTG TGTTTTGCAC TGTCACGCAA 
GGTGTCACGG GATGAGGTGC CTTCAMCAAA AGGTCATATT 
(2) INFORMATION FOR SEQ ID NO: 141: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 446 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 

TGCGGACATC CAGCGTTCCG CCATCATCCA CACGGGTTCT GGTGGCTGTG TGTCCGGTCA 60 

GCACATCCAG ACGGCCGCCA TTTTCCAGTA CGACATTATC AGCTTTACCC TCCACAACAG 120 

AGAATGCTCC CAGGCGGTTT GTGCCGGTGA CGGTTGCAGC AGTGCTGGTA ACCAGTGCTC 180 

CGCCCGTGTT CTGGGTGACA TCAGACGCTT TACCGCCGGC ATTCACCTGC AGCTTTCCTT 24 0 

TCTGGTTGAT GGTGGTATGC GCGGCAGTTC CTCCTTCCTT AATCAMCTGC CAGCCATCAC 300 

GGTTTATCAG CGTCTCTGTT GCCGTGCCAA CGTTGTGTAC ATACTGGTTA MCTCCCTCCA 360 

GTCGGGTGTT AWGTGSCTCT CCGTGTANCG TCTGGTCANC AACAACGCAA ATGANGGTGT 420 

CCCGTGCCAT CCCCGAAGAC CAGTAA 446 
(2) INFORMATION FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 327 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: 

TGAATACGTT AAGTCAGCAG ACCGGCGGAG ACAGTCTGAC ACAGACAGCG CTGCAGCAGT 6 0 

ATGAGCCGGT GGTGGTTGGC TCTCCGCAAT GGCACGATGA ACTGGCAGGT GCCCTGAATA 120 

ATATTGCCGG AGTTCGCCAC TGACCGGTCA GACCGGTATC AGTGATGACT GGCCACTGCC 180 

TTCCGTCAAC AATGGATACC TGGTTCCGTC CACGGACCCG GACAGTCCGT ATCTGATTAC 24 0 

GGTGAACCCG AAACTGGATR GTCTCGGACA GGTGGACAGC CATTTGTTTN CCGGACTGTA 300 

TGAGCTTCTT GGAGCGAAAC CGGGTCA 327 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCTRuic \3bis) 



. The indications made below relate to the microorganism referred to in the description 
on page 5 t line § 



D. IDENTIFICATION OF DEPOSIT 



Further deposits are identified on an additional sheet [X | 



Name of depositary institution 

American Type Culture Collection (ATCC) 



Address of depositary institution (including postal code and country) 

12 301 Parklawn Drive 
Rockville, Maryland 20852 
United States of America 



Date of deposit 

September 23, 1996 



Accession Number 



97726 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet | | 



DNA plasmid PAI-1 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the International Bureau later (specify the general nature of the indications eg., 'Accession 
Number of Deposit*) 



For receiving Office use only 



| | This sheet was received with the international application 



Authorized officer 



For International Bureau use only 



j | This sheet was received by the International Bureau on: 



Authorized officer 
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